molgenis / CoNVaDING

Copy Number Variation Detection In Next-generation sequencing Gene panels was designed for small (single-exon) copy number variation (CNV) detection in high coverage next-generation sequencing (NGS) data
GNU Lesser General Public License v3.0
20 stars 22 forks source link

Error running convading: Incorrect BED file format, please check your BED file before processing #45

Open lakhujanivijay opened 4 years ago

lakhujanivijay commented 4 years ago

I am getting 2 types of errors while running convading. My command is

perl/CoNVaDING-1.2.1/CoNVaDING.pl \
>   -mode StartWithBam \
>   -inputDir /00_bam_files \
>   -controlsDir 00_bam_files \
>   -outputDir 01_coverage_out \
>   -bed Target_annotated_intervals.bed

The errors are :

awk: cmd. line:1: fatal: division by zero attempted

Incorrect BED file format, please check your BED file before processing.

for fatal: division by zero, I know it is a known issue and I am waiting for the response here https://github.com/molgenis/CoNVaDING/issues/28#issuecomment-591811274

However, I don't know the reason about this error:

Incorrect BED file format, please check your BED file before processing.

Can you please help ?

ljohansson commented 4 years ago

Did you create your bed file according to the specifications? https://molgenis.gitbooks.io/convading/#create-normalized-count-files

Can you paste a few lines of your bedfile here?

lakhujanivijay commented 4 years ago

This is how my bed file looks

1   69090   70008   OR4F5
1   621095  622034  OR4F16
1   861321  861393  SAMD11
mmterpstra commented 4 years ago

It tries to match $line =~ m/.+\t[0-9]{1,}\t[0-9]{1,}\t[A-Za-z0-9]{1,}.+/gs can you run the code below and paste the results here?

perl -wne 'warn "Invalid line !!!${_}!!! at $. " if(not(m/.+\t[0-9]{1,}\t[0-9]{1,}\t[A-Za-z0-9]{1,}.+/gs));' Target_annotated_intervals.bed
lakhujanivijay commented 4 years ago

@mmterpstra Interestingly, the above command returns nothing.

mmterpstra commented 4 years ago

The code is confusing maybe run a dos2unix or try (this is more like the actual parsing of the file):

perl -wne 'my @bed =<ARGV>;my $i = 0; map {s/(?>\x0D\x0A?|[\x0A-\x0C\x85\x{2028}\x{2029}])//;chomp ;warn "Invalid line !!!${_}!!! at $i " if(not(m/.+\t[0-9]{1,}\t[0-9]{1,}\t[A-Za-z0-9]{1,}.+/gs)); $i++}(@bed);' Target_annotated_intervals.bed

If you got some results please post them it might help other people debugging their results.

lakhujanivijay commented 4 years ago

This time I got some errors:

perl -wne 'my @bed =<ARGV>;my $i = 0; map {s/(?>\x0D\x0A?|[\x0A-\x0C\x85\x{2028}\x{2029}])//;chomp ;warn "Invalid line !!!${_}!!! at $i " if(not(m/.+\t[0-9]{1,}\t[0-9]{1,}\t[A-Za-z0-9]{1,}.+/gs)); $i++}(@bed);' Target_annotated_intervals.bed
Invalid line !!!6   166571802   166572076   T!!! at 71623  at -e line 1, <> line 193137.
Invalid line !!!6   166574324   166574454   T!!! at 71624  at -e line 1, <> line 193137.
Invalid line !!!6   166575934   166576108   T!!! at 71625  at -e line 1, <> line 193137.
Invalid line !!!6   166578092   166578154   T!!! at 71626  at -e line 1, <> line 193137.
Invalid line !!!6   166578287   166578349   T!!! at 71627  at -e line 1, <> line 193137.
Invalid line !!!6   166579193   166579328   T!!! at 71628  at -e line 1, <> line 193137.
Invalid line !!!6   166580079   166580344   T!!! at 71629  at -e line 1, <> line 193137.
Invalid line !!!6   166580873   166581079   T!!! at 71630  at -e line 1, <> line 193137.

What does that mean?

mmterpstra commented 4 years ago

Yes! This helps: the last '.+' isn't matched if you change the 'T' to a two letter combination then it will work. lazy fix:

perl -wlape 'if(length($F[3]) == 1 ){$F[3].="_".$F[3];$_=join("\t",@F); warn "Fixed line number $. new content !!!$_!!!";}' old.bed > new.bed
lakhujanivijay commented 4 years ago

Sorry, I did not get you. How to I edit the BED file such that it works. What is wrong with the BED. Can you please elaborate ?

mmterpstra commented 4 years ago

The single letter name gives errors in convading. Also update the code above cause the gene name T should be changed to a constant name for the gene based normalsation. It now changes to TT instead of T$linenumber