williamritchie / IRFinder

Detecting intron retention from RNA-Seq experiments
53 stars 25 forks source link

Incomplete in reference building #137

Closed PencilsSharpener closed 3 years ago

PencilsSharpener commented 3 years ago

I met an issue when "BuildRef" mode was used for building a reference from Ensembl FTP site. It seems crashed on bed file determination when stdin came out... bash: -c: line 0: syntax error near unexpected token (' bash: -c: line 0:samtools view genome_fragments.bam usage: CAT (prepare | contigs | bin | bins | add_names | summarise) [-v / --version] [-h / --help]|awk -v tmpdir="tmp_211035" -v tmpcmp="/usr/bin/lzop" -v tmpext="lzo" 'BEGIN{FS="[\t!]"; OFS="\t"}{if (($8 == "70M") && ($3 == $6) && ($2 == $5)) {print $5, $6-1, $6+69 | (tmpcmp " -c1 > " tmpdir "/" $5 ".bed." tmpext ) }}END{close( (tmpcmp " -c1 > " tmpdir "/" $5 ".bed." tmpext ))}'' [main_samview] region "CAT:" specifies an invalid region or unknown reference. Continue anyway. [main_samview] region "error:" specifies an invalid region or unknown reference. Continue anyway. [main_samview] region "one" specifies an invalid region or unknown reference. Continue anyway. [main_samview] region "of" specifies an invalid region or unknown reference. Continue anyway. [main_samview] region "the" specifies an invalid region or unknown reference. Continue anyway. [main_samview] region "arguments" specifies an invalid region or unknown reference. Continue anyway. [main_samview] region "prepare" specifies an invalid region or unknown reference. Continue anyway. [main_samview] region "contigs" specifies an invalid region or unknown reference. Continue anyway. [main_samview] region "bin" specifies an invalid region or unknown reference. Continue anyway. [main_samview] region "bins" specifies an invalid region or unknown reference. Continue anyway. [main_samview] region "add_names" specifies an invalid region or unknown reference. Continue anyway. [main_samview] region "summarise" specifies an invalid region or unknown reference. Continue anyway. [main_samview] region "is" specifies an invalid region or unknown reference. Continue anyway. [main_samview] region "required" specifies an invalid region or unknown reference. Continue anyway. Feb 21 10:33:33 ... merging filtered genome fragments... lzop: : not a lzop file Feb 21 10:33:33 ... calculating regions for exclusion... Feb 21 10:33:49 ... cleaning temporary files... rm: missing operand Try 'rm --help' for more information. <Phase 3: IRFinder Reference Preparation> Feb 21 10:33:54 ... building Ref 1... Feb 21 10:34:21 ... building Ref 2... Feb 21 10:34:21 ... building Ref 3... Feb 21 10:34:21 ... building Ref 4... Error: unable to open file or unable to determine types for file stdin

dg520 commented 3 years ago

@PencilsSharpener Interesting. I believe the error is due to Line 50 in the bin/util/Mapability script:

cat "$STARGENOME/chrName.txt" 

It is supposed to get the content in chrName.txt under the STAR reference folder, which is inside the IRFinder reference folder. However, your bash interpreter somehow got it as CAT (with expected arguments prepare | contigs | bin | bins | add_names | summarise), instead of the bash command cat.

With a bit google search, there is a Contig Annotation Tool called CAT with the aforementioned arguments. By default, this tool should be called via CAT (all in capital letters). But somehow your bash thinks cat (all in small letters) will also call this tool. My best guess is there is an alias between cat and CAT, which should have been avoided. Otherwise the bash command cat will be masked and lead to this error.

Could you please run cat --help (all in SMALL letters) in your terminal and check whether it returns the manual of Linux cat or Contig Annotation Tool?
And did you run IRFinder on a local computational node (i.e. your own machine, or a defined computer) or via a cloud-based system with a job submission mechanism? If the latter, the linux cat could be masked by some settings on the server-side, which might be hard to change by a non-admin user.

Let me know the answers. Thanks.

PencilsSharpener commented 3 years ago

@PencilsSharpener Interesting. I believe the error is due to Line 50 in the bin/util/Mapability script:

cat "$STARGENOME/chrName.txt" 

It is supposed to get the content in chrName.txt under the STAR reference folder, which is inside the IRFinder reference folder. However, your bash interpreter somehow got it as CAT (with expected arguments prepare | contigs | bin | bins | add_names | summarise), instead of the bash command cat.

With a bit google search, there is a Contig Annotation Tool called CAT with the aforementioned arguments. By default, this tool should be called via CAT (all in capital letters). But somehow your bash thinks cat (all in small letters) will also call this tool. My best guess is there is an alias between cat and CAT, which should have been avoided. Otherwise the bash command cat will be masked and lead to this error.

Could you please run cat --help (all in SMALL letters) in your terminal and check whether it returns the manual of Linux cat or Contig Annotation Tool? And did you run IRFinder on a local computational node (i.e. your own machine, or a defined computer) or via a cloud-based system with a job submission mechanism? If the latter, the linux cat could be masked by some settings on the server-side, which might be hard to change by a non-admin user.

Let me know the answers. Thanks.

You are right! Previously I made alias between cat and CAT(Comparative-Annotation-Toolkit) in my .bashrc file, now I change it... I'll tell you how's the results after this round finished!! Thank you!

PencilsSharpener commented 3 years ago

OK, I passed that issue, but now there is another issue made output files empty. <Phase 3: IRFinder Reference Preparation> Feb 21 13:01:04 ... building Ref 1... sort: unknown subpragma '_mergesort' at /scratch/ping/cowboy_scratch/fibroblast_lncRNA/raw_data/newdata/ASE/IRFinder-1.3.0/bin/util/gtf2bed-custom.pl line 33. BEGIN failed--compilation aborted at /scratch/ping/cowboy_scratch/fibroblast_lncRNA/raw_data/newdata/ASE/IRFinder-1.3.0/bin/util/gtf2bed-custom.pl line 33. Feb 21 13:01:04 ... building Ref 2... Feb 21 13:01:07 ... building Ref 3... Feb 21 13:01:07 ... building Ref 4... Feb 21 13:01:12 ... building Ref 5... Feb 21 13:01:18 ... building Ref 6... Feb 21 13:01:18 ... building Ref 7... Feb 21 13:01:18 ... building Ref 8... Feb 21 13:01:18 ... building Ref 9... Feb 21 13:01:18 ... building Ref 10c... Feb 21 13:01:18 ... building Ref 11c... Error: exclude.directional.bed is empty. Error: introns.unique.bed is empty. Error: ref-cover.bed is empty. Error: ref-read-continues.ref is empty. Error: ref-sj.ref is empty. Error: IRFinder reference building FAILED.

dg520 commented 3 years ago

Which Perl version are you using?

PencilsSharpener commented 3 years ago

I used perl v5.26.2. Is that older vesion that requirement?

dg520 commented 3 years ago

@PencilsSharpener That's a bit weird. In an interactive Perl console, can you run

use sort '_mergesort';

without error? The error message you shared suggested you could not do the above successfully, but v5.26.2 is supposed to support '_mergesort' out-of-the-box.

I would suggest you use a Perl version higher than 5.28.0 if possible. Otherwise, you can keep your current Perl and try commenting out Line 29 of bin/util/gtf2bed-custom.pl, which reads

use if $]<5.028, "sort", '_mergesort';

But this might or might not raise sorting stability concern. In either solution, you have to re-run everything from the beginning.

PencilsSharpener commented 3 years ago

Hi Dadi, I solved the error by conmmenting out the line you suggested, and reference can be used for IR identification. Thank you!

dg520 commented 3 years ago

@PencilsSharpener Glad to hear that and you're welcome!