vpc-ccg / sedef

Identification of segmental duplications in the genome
MIT License
26 stars 8 forks source link

Helper script to run on nonhuman data #12

Closed mchaisso closed 5 years ago

mchaisso commented 5 years ago

I'd like to run on nonhuman genomes, but I'm too lazy to do anything to adapt the code to handle contigs that do not start "chr", or to run the manual transmission commands listed in the readme. Can you write a shell script to run this on files that start with any chromosome name?

calkan commented 5 years ago

doesn't this help?:

" You can pass a regex to parameter -e / --exclude to sedef.sh to modify this criteria. For example, to use FASTAs with NCBI format pass -e "^[0-9A-Z]+$"."

mchaisso commented 5 years ago

Yes and no. First, the regex specifies contigs to include, so the option is strange. Next, just add an option that says "--all", so I don't have to write -e "^.*". If no valid contigs are found, a help message about specifying "--all" should be printed.

On Fri, Jun 28, 2019 at 12:40 AM Can Alkan notifications@github.com wrote:

doesn't this help?:

" You can pass a regex to parameter -e / --exclude to sedef.sh to modify this criteria. For example, to use FASTAs with NCBI format pass -e "^[0-9A-Z]+$"."

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vpc-ccg/sedef/issues/12?email_source=notifications&email_token=AAKPGPXMPTZRDLJLGAZIMXLP4W6AXA5CNFSM4H376HG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYZKVYY#issuecomment-506637027, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKPGPTINIY4W5NJ23CR22DP4W6AXANCNFSM4H376HGQ .

mchaisso commented 5 years ago

Otherwise, pretty glad to have this method out there, nobody wants to wgac.

On Fri, Jun 28, 2019 at 9:24 AM Mark Chaisson mchaisso@gmail.com wrote:

Yes and no. First, the regex specifies contigs to include, so the option is strange. Next, just add an option that says "--all", so I don't have to write -e "^.*". If no valid contigs are found, a help message about specifying "--all" should be printed.

On Fri, Jun 28, 2019 at 12:40 AM Can Alkan notifications@github.com wrote:

doesn't this help?:

" You can pass a regex to parameter -e / --exclude to sedef.sh to modify this criteria. For example, to use FASTAs with NCBI format pass -e "^[0-9A-Z]+$"."

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vpc-ccg/sedef/issues/12?email_source=notifications&email_token=AAKPGPXMPTZRDLJLGAZIMXLP4W6AXA5CNFSM4H376HG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYZKVYY#issuecomment-506637027, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKPGPTINIY4W5NJ23CR22DP4W6AXANCNFSM4H376HGQ .

calkan commented 5 years ago

ok, assigning @inumanag since I have no knowledge of the inner workings of the code

inumanag commented 5 years ago

Please try -t translation.fa (this was added to support non-human genomes and various in-house assemblies). Let me know if it works!

inumanag commented 5 years ago

3d67e9d fixed this