mhuttner / miRA

GNU General Public License v2.0
5 stars 1 forks source link

issue with chromosome not found #2

Open manishbiotechie opened 6 years ago

manishbiotechie commented 6 years ago

hi I am trying to run miRA but I am getting following error 2 lines of the SAM file were ignored because they were invalid The Chromosome was not found as a @SQ entry

test run was successful and i got the desired results. kindly help

jengelmann commented 6 years ago

Hi, thanks for using miRA. The SAM file header needs to have one line for each chromosome or scaffold that appears in your SAM file, e.g. for scaffold_1, this could look like this (SN indicating sequence name and LN sequence length): @SQ SN:scaffold_1 LN:6494944 For a chromosome: @SQ SN:chr_1 LN:8443987 Reads mapped to chromosomes/scaffolds not defined in the header will be ignored. Hope that helps, Julia

manishbiotechie commented 6 years ago

Thanks for your quick response.

@HD VN:1.0 SO:unsorted @SQ SN:Ca_LG_1 LN:39901017 @SQ SN:Ca_LG_2 LN:33233457 @SQ SN:Ca_LG_3 LN:42267542 @SQ SN:Ca_LG_4 LN:54992815 @SQ SN:Ca_LG_5 LN:45819701 @SQ SN:Ca_LG_6 LN:54841389 @SQ SN:Ca_LG_7 LN:45279478 @SQ SN:Ca_LG_8 LN:17664089 @SQ SN:scaffold17332 LN:1136

header of sam file are the above listed ones which appear to be alright, problem is at some point else, kindly help

On Tue, Oct 31, 2017 at 8:15 PM, Julia Engelmann notifications@github.com wrote:

Hi, thanks for using miRA. The SAM file header needs to have one line for each chromosome or scaffold that appears in your SAM file, e.g. for scaffold_1, this could look like this (SN indicating sequence name and LN sequence length): @sq https://github.com/sq SN:scaffold_1 LN:6494944 For a chromosome: @sq https://github.com/sq SN:chr_1 LN:8443987 Reads mapped to chromosomes/scaffolds not defined in the header will be ignored. Hope that helps, Julia

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mhuttner/miRA/issues/2#issuecomment-340785040, or mute the thread https://github.com/notifications/unsubscribe-auth/AftrnmE6TjDHxeGFIEm35l8Vlpzn6P8Dks5sxzKfgaJpZM4QMncO .

-- With Regards,

Manish Tiwari, Pre Doctoral Fellow

National Institute of Plant Genome Research Post Box No. 10531, Aruna Asaf Ali Marg, New Delhi - 110067 , India Mob: +918826766869

mevers commented 6 years ago

Can you double-check that you have matching chromosome names from the BAM and FASTA files? For example, we've come across the situation where reads were aligned against a genome with chromosome names "chr_1", "chr_2", etc. (according to the BAM header), while the reference FASTA file had sequences for chromosomes with names "1", "2", etc. In that case, miRA won't be able to locate the matching chromosome for BAM file-derived expression loci.

manishbiotechie commented 6 years ago

I did checked both files have similar chromosome names

mevers commented 6 years ago

I'm not sure what you mean by "similar chromosome names"; names need to match, i.e. for every name in the BAM file the FASTA files needs to contain sequence with the same name.

I noticed your BAM file is not (position) sorted. Can you re-run miRA using a sorted BAM file, and report if the error persists.