williamritchie / IRFinder

Detecting intron retention from RNA-Seq experiments
53 stars 25 forks source link

Files missing in output folder #83

Closed qicaibiology closed 4 years ago

qicaibiology commented 4 years ago

Hi Dadi:

After finishing the reference build. I tried it on the fastq files I downloaded. But in the output folder I have some files missing, what I have done is:

srun -c 10 --mem-per-cpu=10G -n 1 -N 1 -p bigmem2 bin/IRFinder -r REF/Mouse-GRCm38/ -d SRR645846_irfinder/ SRR645846_1.fastq SRR645846_2.fastq > log003 2> error003

In the output folder I have these files: [caiqi@midway2-login2 SRR645846_irfinder]$ ls

Log.out Log.progress.out Log.std.out Unsorted.bam WARNINGS _STARtmp irfinder.stderr irfinder.stdout trim.log

By looking into the warning message:

ERROR: IRFinder appears not to have completed. It appears an unknown component crashed. ERROR: IRFinder appears not to have completed. It appears an unknown component crashed. ERROR: IRFinder appears not to have completed. It appears an unknown component crashed. WARN: ERROR: The paired-end trimming routine appeared not to have completed. It may have crashed if there was corruption in the input fastq files. Do the input fastq files have the same number of lines?

I did the trim, however, I still get the same problems.

What should I do now?

Cai

dg520 commented 4 years ago

Hi @qicaibiology ,

I tried SRR645846 on my end and IRFinder can be complete the analysis successfully.

Did you copy me the entire Error message you got? During the reference preparation, have you paid close attention to the standard output and standard error and are you sure there is no error and warning? And please also post here your STAR, BEDTOOLS and GLIBC versions.

Best, Dadi

qicaibiology commented 4 years ago

Hi Dadi: Sorry for the late reply. I read some issues initiated here which share the similar situation with me. And I checked the standard error in my files, then I guess the reason is that I don't have GLIBC. So I did

conda install -c asmeurer glibc

Then I ran the trimming and other stuff. I just finished trying on file and it looks that it worked.

I will take it to the final step to see how it goes. Appreciate a lot for your response.

Cai

qicaibiology commented 4 years ago

Hi @qicaibiology ,

I tried SRR645846 on my end and IRFinder can be complete the analysis successfully.

Did you copy me the entire Error message you got? During the reference preparation, have you paid close attention to the standard output and standard error and are you sure there is no error and warning? And please also post here your STAR, BEDTOOLS and GLIBC versions.

Best, Dadi

Hi Dadi: The files in my reference directory are:

[caiqi@midway2-login2 Mouse-GRCm38]$ ls IRFinder Mapability STAR genome.fa logSTARbuild transcripts.gtf

files in the IRFinder are:

[caiqi@midway2-login2 IRFinder]$ ls exclude.directional.bed introns.unique.bed ref-read-continues.ref exclude.omnidirectional.bed ref-ROI.bed ref-sj.ref intergenic.ROI.bed ref-cover.bed

There is no error message here.

STAR --2.6.1b bedtools --v2.27.1

(base) [caiqi@midway2-login1 IRFinder-1.2.6_hotfix_R_bug]$ ldd --version ldd (GNU libc) 2.19 Copyright (C) 2014 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Written by Roland McGrath and Ulrich Drepper.

In the quantify step there is no content in the Warning and stderror file after what Installed the GLIBC.

Looking forward to your advice,

Cai

qicaibiology commented 4 years ago

Hi @qicaibiology , I tried SRR645846 on my end and IRFinder can be complete the analysis successfully. Did you copy me the entire Error message you got? During the reference preparation, have you paid close attention to the standard output and standard error and are you sure there is no error and warning? And please also post here your STAR, BEDTOOLS and GLIBC versions. Best, Dadi

Hi Dadi: The files in my reference directory are:

[caiqi@midway2-login2 Mouse-GRCm38]$ ls IRFinder Mapability STAR genome.fa logSTARbuild transcripts.gtf

files in the IRFinder are:

[caiqi@midway2-login2 IRFinder]$ ls exclude.directional.bed introns.unique.bed ref-read-continues.ref exclude.omnidirectional.bed ref-ROI.bed ref-sj.ref intergenic.ROI.bed ref-cover.bed

There is no error message here.

STAR --2.6.1b bedtools --v2.27.1

(base) [caiqi@midway2-login1 IRFinder-1.2.6_hotfix_R_bug]$ ldd --version ldd (GNU libc) 2.19 Copyright (C) 2014 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Written by Roland McGrath and Ulrich Drepper.

In the quantify step there is no content in the Warning and stderror file after what Installed the GLIBC.

Looking forward to your advice,

Cai

However, I got an error in the pool the replicates in the same condition together step:

(base) [caiqi@midway2-login1 IRFinder-1.2.6_hotfix_R_bug]$ srun -c 10 --mem-per-cpu=10G -n 1 -N 1 -p bigmem2 bin/IRFinder -m BAM -r REF/Mouse-GRCm38/ -d pooled_div-8/ <(samtools cat SRR645824_irfinder/Unsorted.bam SRR645826_irfinder/Unsorted.bam ) srun: job 51809 queued and waiting for resources srun: job 51809 has been allocated resources gzip: /dev/fd/63.gz: No such file or directory bin/IRFinder: line 562: 28620 Exit 1 gzip -cd "$1" 28621 Aborted | "$LIBEXEC/irfinder" "$OUTPUTDIR" "$REF/IRFinder/ref-cover.bed" "$REF/IRFinder/ref-sj.ref" "$REF/IRFinder/ref-read-continues.ref" "$REF/IRFinder/ref-ROI.bed" "$OUTPUTDIR/unsorted.frag.bam" >> "$OUTPUTDIR/irfinder.stdout" 2>> "$OUTPUTDIR/irfinder.stderr" ERROR: IRFinder appears not to have completed. It appears an unknown component crashed. ERROR: IRFinder appears not to have completed. It appears an unknown component crashed. ERROR: IRFinder appears not to have completed. It appears an unknown component crashed.

Then I wen to the files of output directly to see what might be the mistake in the irfinder.stderr file it shows:

terminate called after throwing an instance of 'std::length_error' what(): basic_string::_S_create irfinder.stderr (END)

Thank you,

Cai

dg520 commented 4 years ago

Hi @qicaibiology ,

Your IRFinder quantification step didn't complete correctly. That's why your pooling step also failed. We have to figure out where and why quant step gets error. Don't rush to run anything else. And follow my guidance please.

Please upload here the ENTIRE contents in your SRR645846_irfinder/log002 and SRR645846_irfinder/error003 files, if you still keep them. Otherwise, rerun the following:

srun -c 10 --mem-per-cpu=10G -n 1 -N 1 -p bigmem2 bin/IRFinder -r REF/Mouse-GRCm38/ -d SRR645846_irfinder/ SRR645846_1.fastq SRR645846_2.fastq > log003 2> error003

At meanwhile, please try to run

ldd --version

This might not work on your system, but if it works, upload here everything printed on your screen.

Please don't try to install stuff you're not sure about. Other people on GitHub are very likely to have a very different problem as yours, even if they might sound similar to you.

Best, Dadi

qicaibiology commented 4 years ago

srun -c 10 --mem-per-cpu=10G -n 1 -N 1 -p bigmem2 bin/IRFinder -r REF/Mouse-GRCm38/ -d SRR645846_irfinder/ SRR645846_1.fastq SRR645846_2.fastq > log003 2> error003

I just re-ran the command: [caiqi@midway2-login2 IRFinder-1.2.6_hotfix_R_bug]$ srun -c 10 --mem-per-cpu=10G -n 1 -N 1 -p bigmem2 bin/IRFinder -r REF/Mouse-GRCm38/ -d SRR645846_irfinder/ SRR645846_1.fastq SRR645846_2.fastq > log003 2> error003 &

The content in the SRR645846/ are:

[caiqi@midway2-login2 SRR645846_irfinder]$ ls Log.out Log.progress.out Log.std.out Unsorted.bam WARNINGS _STARtmp irfinder.stderr irfinder.stdout trim.log

the content of log003 is: it is empty. the content of error003 is: srun:

job 53459 queued and waiting for resources srun: job 53459 has been allocated resources ERROR: IRFinder appears not to have completed. It appears an unknown component crashed. ERROR: IRFinder appears not to have completed. It appears an unknown component crashed. ERROR: IRFinder appears not to have completed. It appears an unknown component crashed. WARN: ERROR: The paired-end trimming routine appeared not to have completed. It may have crashed if there was corruption in the input fastq files. Do the input fastq files have the same number of lines?

[caiqi@midway2-login2 IRFinder-1.2.6_hotfix_R_bug]$ ldd --version

ldd (GNU libc) 2.19 Copyright (C) 2014 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Written by Roland McGrath and Ulrich Drepper.

I notice that after I log out of terminal after last message my conda can not be activated using conda activation. previously it is always activated automatically after I log in. But never mind I can still use module load to do these things. Just mentioned so that it may be used for analysis.

Thanks

Cai

qicaibiology commented 4 years ago

srun -c 10 --mem-per-cpu=10G -n 1 -N 1 -p bigmem2 bin/IRFinder -r REF/Mouse-GRCm38/ -d SRR645846_irfinder/ SRR645846_1.fastq SRR645846_2.fastq > log003 2> error003

I just re-ran the command: [caiqi@midway2-login2 IRFinder-1.2.6_hotfix_R_bug]$ srun -c 10 --mem-per-cpu=10G -n 1 -N 1 -p bigmem2 bin/IRFinder -r REF/Mouse-GRCm38/ -d SRR645846_irfinder/ SRR645846_1.fastq SRR645846_2.fastq > log003 2> error003 &

The content in the SRR645846/ are:

[caiqi@midway2-login2 SRR645846_irfinder]$ ls Log.out Log.progress.out Log.std.out Unsorted.bam WARNINGS _STARtmp irfinder.stderr irfinder.stdout trim.log

the content of log003 is: it is empty. the content of error003 is: srun:

job 53459 queued and waiting for resources srun: job 53459 has been allocated resources ERROR: IRFinder appears not to have completed. It appears an unknown component crashed. ERROR: IRFinder appears not to have completed. It appears an unknown component crashed. ERROR: IRFinder appears not to have completed. It appears an unknown component crashed. WARN: ERROR: The paired-end trimming routine appeared not to have completed. It may have crashed if there was corruption in the input fastq files. Do the input fastq files have the same number of lines?

[caiqi@midway2-login2 IRFinder-1.2.6_hotfix_R_bug]$ ldd --version

ldd (GNU libc) 2.19 Copyright (C) 2014 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Written by Roland McGrath and Ulrich Drepper.

I notice that after I log out of terminal after last message my conda can not be activated using conda activation. previously it is always activated automatically after I log in. But never mind I can still use module load to do these things. Just mentioned so that it may be used for analysis.

Thanks

Cai

It looks that under conda environment it is better?

dg520 commented 4 years ago

Hi @qicaibiology ,

Let's make sure you have successfully built the IRFinder reference. Please go to your IRFinder reference folder and post the ENTIRE contents in the following two files:

logSTARbuild/Log.out
Mapability/Log.out

Also in the IRFinder reference folder, run the following:

ls -ll IRFinder

and post here all the file sizes.

Finally, please let me know which version of Perl you're using? You can check it by perl -v. Versions starting from v5.28.0 have a discrepancy in sort usage and will lead to broken reference building.

qicaibiology commented 4 years ago

Hi @qicaibiology ,

Let's make sure you have successfully built the IRFinder reference. Please go to your IRFinder reference folder and post the ENTIRE contents in the following two files:

logSTARbuild/Log.out
Mapability/Log.out

Also in the IRFinder reference folder, run the following:

ls -ll IRFinder

and post here all the file sizes.

Finally, please let me know which version of Perl you're using? You can check it by perl -v. Versions starting from v5.28.0 have a discrepancy in sort usage and will lead to broken reference building.

Hi @qicaibiology ,

Let's make sure you have successfully built the IRFinder reference. Please go to your IRFinder reference folder and post the ENTIRE contents in the following two files:

logSTARbuild/Log.out
Mapability/Log.out

Also in the IRFinder reference folder, run the following:

ls -ll IRFinder

and post here all the file sizes.

Finally, please let me know which version of Perl you're using? You can check it by perl -v. Versions starting from v5.28.0 have a discrepancy in sort usage and will lead to broken reference building.

Hi Dadi: Thank you for your patience, I appreciate a lot. under the REF/ there are two files:

[caiqi@midway2-login2 REF]$ ls Mouse-GRCm38 extra-input-files Under the reference file Mouse-GRCm38:

[caiqi@midway2-login2 Mouse-GRCm38]$ ls IRFinder Mapability STAR genome.fa logSTARbuild transcripts.gtf

Under logSTARbuild it is:

[caiqi@midway2-login2 logSTARbuild]$ ls Log.out And the content of log.out is:

STAR version=STAR_2.6.1b STAR compilation time,server,dir=Wed Oct 17 12:10:01 CDT 2018 midway2-login1.rcc.local:/software/STAR-2.6.1b-el7-x86_64/source

DEFAULT parameters:

versionSTAR 20201 versionGenome 20101 20200
parametersFiles -
sysShell - runMode alignReads runThreadN 1 runDirPerm User_RWX runRNGseed 777 genomeDir ./GenomeDir/ genomeLoad NoSharedMemory genomeFastaFiles -
genomeChainFiles -
genomeSAindexNbases 14 genomeChrBinNbits 18 genomeSAsparseD 1 genomeSuffixLengthMax 18446744073709551615 genomeFileSizes 0
genomeConsensusFile - readFilesType Fastx
readFilesIn Read1 Read2
readFilesPrefix - readFilesCommand -
readMatesLengthsIn NotEqual readMapNumber 18446744073709551615 Under Mapability:

[caiqi@midway2-login2 Mapability]$ ls Log.final.out Log.out Log.progress.out Log.std.out MapabilityExclusion.bed.gz SJ.out.tab

The content of Mapability/Log.out is:

STAR version=STAR_2.6.1b STAR compilation time,server,dir=Wed Oct 17 12:10:01 CDT 2018 midway2-login1.rcc.local:/software/STAR-2.6.1b-el7-x86_64/source

DEFAULT parameters:

versionSTAR 20201 versionGenome 20101 20200
parametersFiles -
sysShell - runMode alignReads runThreadN 1 runDirPerm User_RWX runRNGseed 777 genomeDir ./GenomeDir/ genomeLoad NoSharedMemory genomeFastaFiles -
genomeChainFiles -
genomeSAindexNbases 14 genomeChrBinNbits 18 genomeSAsparseD 1 genomeSuffixLengthMax 18446744073709551615 genomeFileSizes 0
genomeConsensusFile - readFilesType Fastx
readFilesIn Read1 Read2
readFilesPrefix - readFilesCommand -
readMatesLengthsIn NotEqual readMapNumber 18446744073709551615

Perl version:

[caiqi@midway2-login2 Mapability]$ perl --version

This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi (with 39 registered patches, see perl -V for more detail)

Copyright 1987-2012, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on this system using "man perl" or "perldoc perl". If you have access to the Internet, point your browser at http://www.perl.org/, the Perl Home Page.

[

caiqi@midway2-login2 Mouse-GRCm38]$ ls -ll IRFinder/ total 147200 -rw-rw-r-- 1 caiqi caiqi 18553351 Apr 15 14:48 exclude.directional.bed -rw-rw-r-- 1 caiqi caiqi 25874468 Apr 15 14:48 exclude.omnidirectional.bed -rw-rw-r-- 1 caiqi caiqi 291166 Apr 15 14:48 intergenic.ROI.bed -rw-rw-r-- 1 caiqi caiqi 11954588 Apr 15 14:48 introns.unique.bed -rw-rw-r-- 1 caiqi caiqi 324171 Apr 15 14:48 ref-ROI.bed -rw-rw-r-- 1 caiqi caiqi 82688448 Apr 15 14:48 ref-cover.bed -rw-rw-r-- 1 caiqi caiqi 5316472 Apr 15 14:48 ref-read-continues.ref

Thank you and let me know what should I do next.

-rw-rw-r-- 1 caiqi caiqi 5233601 Apr 15 14:48 ref-sj.ref

dg520 commented 4 years ago

Hi @qicaibiology ,

I cannot see the full content of the two Log.out. Could you please copy and paste again and make sure from it's from top to bottom?

qicaibiology commented 4 years ago

STAR version=STAR_2.6.1b STAR compilation time,server,dir=Wed Oct 17 12:10:01 CDT 2018 midway2-login1.rcc.local:/software/STAR-2.6.1b-el7-x86_64/source

DEFAULT parameters:

versionSTAR 20201 versionGenome 20101 20200 parametersFiles - sysShell - runMode alignReads runThreadN 1 runDirPerm User_RWX runRNGseed 777 genomeDir ./GenomeDir/ genomeLoad NoSharedMemory genomeFastaFiles - genomeChainFiles - genomeSAindexNbases 14 genomeChrBinNbits 18 genomeSAsparseD 1 genomeSuffixLengthMax 18446744073709551615 genomeFileSizes 0 genomeConsensusFile - readFilesType Fastx readFilesIn Read1 Read2 readFilesPrefix - readFilesCommand - readMatesLengthsIn NotEqual readMapNumber 18446744073709551615

Hi @qicaibiology ,

I cannot see the full content of the two Log.out. Could you please copy and paste again and make sure from it's from top to bottom? Here it is: logSTARbuild/Log.out: STAR version=STAR_2.6.1b STAR compilation time,server,dir=Wed Oct 17 12:10:01 CDT 2018 midway2-login1.rcc.local:/software/STAR-2.6.1b-el7-x86_64/source

DEFAULT parameters:

versionSTAR 20201 versionGenome 20101 20200
parametersFiles -
sysShell - runMode alignReads runThreadN 1 runDirPerm User_RWX runRNGseed 777 genomeDir ./GenomeDir/ genomeLoad NoSharedMemory genomeFastaFiles -
genomeChainFiles -
genomeSAindexNbases 14 genomeChrBinNbits 18 genomeSAsparseD 1 genomeSuffixLengthMax 18446744073709551615 genomeFileSizes 0
genomeConsensusFile - readFilesType Fastx
readFilesIn Read1 Read2
readFilesPrefix - readFilesCommand -
readMatesLengthsIn NotEqual readMapNumber 18446744073709551615

Mappability/Log.out: STAR version=STAR_2.6.1b STAR compilation time,server,dir=Wed Oct 17 12:10:01 CDT 2018 midway2-login1.rcc.local:/software/STAR-2.6.1b-el7-x86_64/source

DEFAULT parameters:

versionSTAR 20201 versionGenome 20101 20200
parametersFiles -
sysShell - runMode alignReads runThreadN 1 runDirPerm User_RWX runRNGseed 777 genomeDir ./GenomeDir/ genomeLoad NoSharedMemory genomeFastaFiles -
genomeChainFiles -
genomeSAindexNbases 14 genomeChrBinNbits 18 genomeSAsparseD 1 genomeSuffixLengthMax 18446744073709551615 genomeFileSizes 0
genomeConsensusFile - readFilesType Fastx
readFilesIn Read1 Read2
readFilesPrefix - readFilesCommand -
readMatesLengthsIn NotEqual readMapNumber 18446744073709551615

dg520 commented 4 years ago

Hi @qicaibiology ,

Still not intact. Please send me an email to dgao2 at mgh dot harvard dot edu with these two files attached and indicate me which one is from logSTARbuild folder and which one is from Mapability folder. I'll look into them. Thanks.

qicaibiology commented 4 years ago

Hi @qicaibiology ,

Still not intact. Please send me an email to dgao2 at mgh dot harvard dot edu with these two files attached and indicate me which one is from logSTARbuild folder and which one is from Mapability folder. I'll look into them. Thanks.

Sent through email, please have a look.

Sorry that I did not realized that I have not paste the full content.

dg520 commented 4 years ago

Hi @qicaibiology ,

I made IRFinder reference according to your genome and annotation. As I said earlier, the chromosomes in your FASTA file DO NOT match the chromosomes in your GTF file. Actually your GTF file has more chromosomes that FASTA (which only has chromosome 1-19, X, Y and MT). This leads to an incomplete generation of the reference. See the following warning and error message during reference preparation:

Build Ref 1
Build Ref 2
Build Ref 3
Build Ref 4
***** WARNING: File introns.unique.bed has inconsistent naming convention for record:
CHR_MG117_PATCH 108783870       108796063       Ip6k2/ENSMUSG00000106672/+      0       +

***** WARNING: File introns.unique.bed has inconsistent naming convention for record:
CHR_MG117_PATCH 108783870       108796063       Ip6k2/ENSMUSG00000106672/+      0       +

Build Ref 5
***** WARNING: File introns.unique.bed has inconsistent naming convention for record:
CHR_MG117_PATCH 108783870       108796063       Ip6k2/ENSMUSG00000106672/+      0       +

***** WARNING: File introns.unique.bed has inconsistent naming convention for record:
CHR_MG117_PATCH 108783870       108796063       Ip6k2/ENSMUSG00000106672/+      0       +

Build Ref 6
Build Ref 7
Build Ref 8
Error: requested chromosome CHR_MG117_PATCH does not exist in the genome file /dev/fd/63. Exiting.
Build Ref 9
Build Ref 10c
Build Ref 11c
COMPLETE
Ref build result: 0
ALL DONE

Pay attention to the WARNING, it exactly tells you that there is inconsistency in the files. In addition, you've encountered an ERROR towards the end as well. Although this is not good, it is NOT going to fail your quantification step, as IRFinder only counts IR on chromosomes according to STAR genome reference. In your case, they are chromosome 1-19, X, Y and MT.

To confirm this, I run the quant step using the Sample SRR645846 downloaded from https://www.ebi.ac.uk/ena/data/view/SRR645846&display=html. This is supposed to be the same sample as you used. It completed successfully.

This means there is no problem of the genome and annotation (although not perfect), no problem in the sample fastq (if you also downloaded from the same site as me), and no problem in IRFinder. The only reason I can think about is there is something incompatible in your system to work with IRFinder. Now let me ask you, what’s the file size you get for the file Unsorted.bam in your output folder? You can get this by running:

ls -ll Unsorted.bam
qicaibiology commented 4 years ago

Thanks for your teaching:

The size for the bam file you menitoned is like this:

[caiqi@midway2-login1 SRR645846_irfinder]$ ls -ll Unsorted.bam

-rw-rw-r-- 1 caiqi caiqi 39844 Apr 15 20:09 Unsorted.bam

​IS it right?

Cai


From: Dadi notifications@github.com Sent: Thursday, April 16, 2020 4:52 PM To: williamritchie/IRFinder IRFinder@noreply.github.com Cc: Cai Qi caiqi@uchicago.edu; Mention mention@noreply.github.com Subject: Re: [williamritchie/IRFinder] Files missing in output folder (#83)

Hi @qicaibiologyhttps://github.com/qicaibiology ,

I made IRFinder reference according to your genome and annotation. As I said earlier, the chromosomes in your FASTA file DO NOT match the chromosomes in your GTF file. Actually your GTF file has more chromosomes that FASTA (which only has chromosome 1-19, X, Y and MT). This leads to an incomplete generation of the reference. See the following warning and error message during reference preparation:

Build Ref 1

Build Ref 2

Build Ref 3

Build Ref 4

***** WARNING: File introns.unique.bed has inconsistent naming convention for record:

CHR_MG117_PATCH 108783870 108796063 Ip6k2/ENSMUSG00000106672/+ 0 +

***** WARNING: File introns.unique.bed has inconsistent naming convention for record:

CHR_MG117_PATCH 108783870 108796063 Ip6k2/ENSMUSG00000106672/+ 0 +

Build Ref 5

***** WARNING: File introns.unique.bed has inconsistent naming convention for record:

CHR_MG117_PATCH 108783870 108796063 Ip6k2/ENSMUSG00000106672/+ 0 +

***** WARNING: File introns.unique.bed has inconsistent naming convention for record:

CHR_MG117_PATCH 108783870 108796063 Ip6k2/ENSMUSG00000106672/+ 0 +

Build Ref 6

Build Ref 7

Build Ref 8

Error: requested chromosome CHR_MG117_PATCH does not exist in the genome file /dev/fd/63. Exiting.

Build Ref 9

Build Ref 10c

Build Ref 11c

COMPLETE

Ref build result: 0

ALL DONE

Pay attention to the WARNING, it exactly tells you that there is inconsistency in the files. In addition, you've encountered an ERROR towards the end as well. Although this is not good, it is NOT going to fail your quantification step, as IRFinder only counts IR on chromosomes according to STAR genome reference. In your case, they are chromosome 1-19, X, Y and MT.

To confirm this, I run the quant step using the Sample SRR645846 downloaded from https://www.ebi.ac.uk/ena/data/view/SRR645846&display=html. This is supposed to be the same sample as you used. It completed successfully.

This means there is no problem of the genome and annotation (although not perfect), no problem in the sample fastq (if you also downloaded from the same site as me), and no problem in IRFinder. The only reason I can think about is there is something incompatible in your system to work with IRFinder. Now let me ask you, what’s the file size you get for the file Unsorted.bam in your output folder? You can get this by running:

ls -ll Unsorted.bam

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/williamritchie/IRFinder/issues/83#issuecomment-614916063, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKLOVFMNWF2OQ3BJBGZ3KV3RM543PANCNFSM4MI5AL4A.

qicaibiology commented 4 years ago

Hi Dadi:

One more sentence I want to add is that the bam file I got is generated under conda enviroment yesterday, because I thought in the beginning that our server does not have it.

However, later on I found that the server already has it as I can call it using ldd --version.

But for the quantification step, it looks that it will crash down as I showed yesterday. Very weird. Do you know the command for calling glibc? I mean similar to module load STAR: what is the "module load" for glibc?

Thanks a lot for your help,

Cai


From: Dadi notifications@github.com Sent: Thursday, April 16, 2020 4:52 PM To: williamritchie/IRFinder IRFinder@noreply.github.com Cc: Cai Qi caiqi@uchicago.edu; Mention mention@noreply.github.com Subject: Re: [williamritchie/IRFinder] Files missing in output folder (#83)

Hi @qicaibiologyhttps://github.com/qicaibiology ,

I made IRFinder reference according to your genome and annotation. As I said earlier, the chromosomes in your FASTA file DO NOT match the chromosomes in your GTF file. Actually your GTF file has more chromosomes that FASTA (which only has chromosome 1-19, X, Y and MT). This leads to an incomplete generation of the reference. See the following warning and error message during reference preparation:

Build Ref 1

Build Ref 2

Build Ref 3

Build Ref 4

***** WARNING: File introns.unique.bed has inconsistent naming convention for record:

CHR_MG117_PATCH 108783870 108796063 Ip6k2/ENSMUSG00000106672/+ 0 +

***** WARNING: File introns.unique.bed has inconsistent naming convention for record:

CHR_MG117_PATCH 108783870 108796063 Ip6k2/ENSMUSG00000106672/+ 0 +

Build Ref 5

***** WARNING: File introns.unique.bed has inconsistent naming convention for record:

CHR_MG117_PATCH 108783870 108796063 Ip6k2/ENSMUSG00000106672/+ 0 +

***** WARNING: File introns.unique.bed has inconsistent naming convention for record:

CHR_MG117_PATCH 108783870 108796063 Ip6k2/ENSMUSG00000106672/+ 0 +

Build Ref 6

Build Ref 7

Build Ref 8

Error: requested chromosome CHR_MG117_PATCH does not exist in the genome file /dev/fd/63. Exiting.

Build Ref 9

Build Ref 10c

Build Ref 11c

COMPLETE

Ref build result: 0

ALL DONE

Pay attention to the WARNING, it exactly tells you that there is inconsistency in the files. In addition, you've encountered an ERROR towards the end as well. Although this is not good, it is NOT going to fail your quantification step, as IRFinder only counts IR on chromosomes according to STAR genome reference. In your case, they are chromosome 1-19, X, Y and MT.

To confirm this, I run the quant step using the Sample SRR645846 downloaded from https://www.ebi.ac.uk/ena/data/view/SRR645846&display=html. This is supposed to be the same sample as you used. It completed successfully.

This means there is no problem of the genome and annotation (although not perfect), no problem in the sample fastq (if you also downloaded from the same site as me), and no problem in IRFinder. The only reason I can think about is there is something incompatible in your system to work with IRFinder. Now let me ask you, what’s the file size you get for the file Unsorted.bam in your output folder? You can get this by running:

ls -ll Unsorted.bam

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/williamritchie/IRFinder/issues/83#issuecomment-614916063, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKLOVFMNWF2OQ3BJBGZ3KV3RM543PANCNFSM4MI5AL4A.

dg520 commented 4 years ago

OK. Seems it failed really early. STAR alignment didn't even start. Can you do the following in the IRFinder-1.2.6_hotfix_R_bug folder:

cd src/trim
make

Can you finish the above without error? Can you see a binary file trim in the folder (not trim.o or trim.cpp). If you can see it, do the following:

cp trim ../../bin/util/

Let me know if you can reach there. If you cannot, please post here all the message on your screen after make. This will help me know if it's due to glibc or not.

qicaibiology commented 4 years ago
  1. Can you finish the above without error?:

[caiqi@midway2-login1 trim]$ make

Makefile:33: Depend.list: No such file or directory

/bin/rm -f ./Depend.list

g++ -pipe -std=c++0x -O3 -Wall -Wextra -fopenmp -D'SVN_VERSION_COMPILED="magictrim0.1"' -D'COMPILATION_TIME_PLACE="Thu Apr 16 17:18:06 CDT 2020 midway2-login1.rcc.local:/scratch/midway2/caiqi/IRFinder_1/IRFinder-1.2.6_hotfix_R_bug/src/trim"' -MM TrimReads.cpp sequenceTools.cpp trim.cpp >> Depend.list

g++ -c -pipe -std=c++0x -O3 -Wall -Wextra -fopenmp -D'SVN_VERSION_COMPILED="magictrim0.1"' -D'COMPILATION_TIME_PLACE="Thu Apr 16 17:18:06 CDT 2020 midway2-login1.rcc.local:/scratch/midway2/caiqi/IRFinder_1/IRFinder-1.2.6_hotfix_R_bug/src/trim"' TrimReads.cpp

TrimReads.cpp: In member function 'int TrimReads::trimAll()':

TrimReads.cpp:162:73: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]

for(int i = lName1+1+min(InsertLen-lAdapt2prefix,lR1); i<(lName1+1+lR1); i++){

                                                                     ^

TrimReads.cpp:175:73: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]

for(int i = lName2+1+min(InsertLen-lAdapt1prefix,lR2); i<(lName2+1+lR2); i++){

                                                                     ^

g++ -c -pipe -std=c++0x -O3 -Wall -Wextra -fopenmp -D'SVN_VERSION_COMPILED="magictrim0.1"' -D'COMPILATION_TIME_PLACE="Thu Apr 16 17:18:06 CDT 2020 midway2-login1.rcc.local:/scratch/midway2/caiqi/IRFinder_1/IRFinder-1.2.6_hotfix_R_bug/src/trim"' sequenceTools.cpp

g++ -c -pipe -std=c++0x -O3 -Wall -Wextra -fopenmp -D'SVN_VERSION_COMPILED="magictrim0.1"' -D'COMPILATION_TIME_PLACE="Thu Apr 16 17:18:06 CDT 2020 midway2-login1.rcc.local:/scratch/midway2/caiqi/IRFinder_1/IRFinder-1.2.6_hotfix_R_bug/src/trim"' trim.cpp

g++ -o trim -pipe -std=c++0x -O3 -Wall -Wextra -fopenmp -D'SVN_VERSION_COMPILED="magictrim0.1"' -D'COMPILATION_TIME_PLACE="Thu Apr 16 17:18:06 CDT 2020 midway2-login1.rcc.local:/scratch/midway2/caiqi/IRFinder_1/IRFinder-1.2.6_hotfix_R_bug/src/trim"' TrimReads.o sequenceTools.o trim.o -static -static-libgcc

  1. Can you see a binary file trim in the folder

[caiqi@midway2-login1 trim]$ ls

Depend.list TrimReads.cpp TrimReads.o sequenceTools.cpp sequenceTools.o trim.cpp

Makefile TrimReads.h includedefine.h sequenceTools.h trim trim.o

​I can reach there. There is warning though.

Cai


From: Dadi notifications@github.com Sent: Thursday, April 16, 2020 5:14 PM To: williamritchie/IRFinder IRFinder@noreply.github.com Cc: Cai Qi caiqi@uchicago.edu; Mention mention@noreply.github.com Subject: Re: [williamritchie/IRFinder] Files missing in output folder (#83)

OK. Seems it failed really early. STAR alignment didn't even start. Can you do the following in the IRFinder-1.2.6_hotfix_R_bug folder:

cd src/trim make

Can you finish the above without error? Can you see a binary file trim in the folder (not trim.o or trim.cpp). If you can see it, do the following:

cp trim ../../bin/util/

Let me know if you can reach there. If you cannot, please post here all the message on your screen after make. This will help me know if it's due to glibc or not.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/williamritchie/IRFinder/issues/83#issuecomment-614924392, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKLOVFK2NWYHSFPDT2LUUK3RM57MZANCNFSM4MI5AL4A.

dg520 commented 4 years ago

Great. Warning here is OK. This also means you don't need to load GLIBC. In the src/trim folder, have you done

cp trim ../../bin/util/

? Do it if you haven't. This will copy the newly complied trim to overwrite the original one that comes with IRFinder. The new trim will be set according to your system and mitigate the incompatibility.

Similarly, you also need to go to src/irfinder and src/winflat and run make command in each of the two folders. This should generate irfinder and winflat inside each folder respectively. You also want to copy each of them to IRFinder-1.2.6_hotfix_R_bug/bin/util to overwrite the original files.

Let me know if any of the above step failed with error messages.

Now you can try to re-run IRFinder on your test sample. Here are two of my suggestions:

  1. If you can, try to download the sample from where I downloaded. You should get fastq.gz file from the EBI site I mentioned. Don't unzip them. You can directly feed zipped files to IRFinder.
  2. When you submit IRFinder job to your server, try to avoid using --mem-per-cpu option. I'm not familiar with your server, but there should be an option to set an overall memory instead of doing this per core. Set the overall memory to 64GB if you can. This is because some steps in IRFinder is on a single core and uses a lot of memory.

Let me know how this goes.

qicaibiology commented 4 years ago

OK. It works well so far.

I am going to do it again using the downloaded data same with yours.

Thanks,

Cai


From: Dadi notifications@github.com Sent: Thursday, April 16, 2020 5:36 PM To: williamritchie/IRFinder IRFinder@noreply.github.com Cc: Cai Qi caiqi@uchicago.edu; Mention mention@noreply.github.com Subject: Re: [williamritchie/IRFinder] Files missing in output folder (#83)

Great. Warning here is OK. This also means you don't need to load GLIBC. In the src/trim folder, have you done

cp trim ../../bin/util/

? Do it if you haven't. This will copy the newly complied trim to overwrite the original one that comes with IRFinder. The new trim will be set according to your system and mitigate the incompatibility.

Similarly, you also need to go to src/irfinder and src/winflat and run make command in each of the two folders. This should generate irfinder and winflat inside each folder respectively. You also want to copy each of them to IRFinder-1.2.6_hotfix_R_bug/bin/util to overwrite the original files.

Let me know if any of the above step failed with error messages.

Now you can try to re-run IRFinder on your test sample. Here are two of my suggestions:

  1. If you can, try to download the sample from where I downloaded. You should get fastq.gz file from the EBI site I mentioned. Don't unzip them. You can directly feed zipped files to IRFinder.
  2. When you submit IRFinder job to your server, try to avoid using --mem-per-cpu option. I'm not familiar with your server, but there should be an option to set an overall memory instead of doing this per core. Set the overall memory to 64GB if you can. This is because some steps in IRFinder is on a single core and uses a lot of memory.

Let me know how this goes.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/williamritchie/IRFinder/issues/83#issuecomment-614931900, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKLOVFJTWUUWETG4Z4ZXW7DRM6B7FANCNFSM4MI5AL4A.

qicaibiology commented 4 years ago

Hi Dadi:

I have ran two fastq.gz and the bam file is like this:

[caiqi@midway2-login2 SRR645846_gz_irfinder]$ ls -ll Unsorted.bam

-rw-rw-r-- 1 caiqi caiqi 2855832356 Apr 16 19:42 Unsorted.bam

however, when I was about to pool the 2 replicates (SRR645849) together, I still get the same error with yesterday's.

[caiqi@midway2-login2 IRFinder-1.2.6_hotfix_R_bug]$ srun -c 1 --mem-per-cpu=100G -n 1 -N 1 -p bigmem2 bin/IRFinder -m BAM -r REF/Mouse-GRCm38/ -d pooled_div7/ <(samtools cat SRR645846_gz_irfinder/Unsorted.bam SRR645849_gz_irfinder/Unsorted.bam) >log1 2>error1 &

[caiqi@midway2-login2 IRFinder-1.2.6_hotfix_R_bug]$ cat error1

srun: job 122613 queued and waiting for resources

srun: job 122613 has been allocated resources

gzip: /dev/fd/63.gz: No such file or directory

Illegal division by zero at /scratch/midway2/caiqi/IRFinder_1/IRFinder-1.2.6_hotfix_R_bug/bin/util/warnings line 207.

srun: error: midway2-bigmem01: task 0: Exited with exit code 255

I went to /bin/util/warnings but hard to tell which line is 207. Sorry to bother you again. I checked it. There is nothing in irfinder.stderr or waringings.

Which part generated this bug?

Thanks a lot,

Cai


From: Cai Qi caiqi@uchicago.edu Sent: Thursday, April 16, 2020 6:25 PM To: williamritchie/IRFinder IRFinder@noreply.github.com; williamritchie/IRFinder reply@reply.github.com Cc: Mention mention@noreply.github.com Subject: Re: [williamritchie/IRFinder] Files missing in output folder (#83)

OK. It works well so far.

I am going to do it again using the downloaded data same with yours.

Thanks,

Cai


From: Dadi notifications@github.com Sent: Thursday, April 16, 2020 5:36 PM To: williamritchie/IRFinder IRFinder@noreply.github.com Cc: Cai Qi caiqi@uchicago.edu; Mention mention@noreply.github.com Subject: Re: [williamritchie/IRFinder] Files missing in output folder (#83)

Great. Warning here is OK. This also means you don't need to load GLIBC. In the src/trim folder, have you done

cp trim ../../bin/util/

? Do it if you haven't. This will copy the newly complied trim to overwrite the original one that comes with IRFinder. The new trim will be set according to your system and mitigate the incompatibility.

Similarly, you also need to go to src/irfinder and src/winflat and run make command in each of the two folders. This should generate irfinder and winflat inside each folder respectively. You also want to copy each of them to IRFinder-1.2.6_hotfix_R_bug/bin/util to overwrite the original files.

Let me know if any of the above step failed with error messages.

Now you can try to re-run IRFinder on your test sample. Here are two of my suggestions:

  1. If you can, try to download the sample from where I downloaded. You should get fastq.gz file from the EBI site I mentioned. Don't unzip them. You can directly feed zipped files to IRFinder.
  2. When you submit IRFinder job to your server, try to avoid using --mem-per-cpu option. I'm not familiar with your server, but there should be an option to set an overall memory instead of doing this per core. Set the overall memory to 64GB if you can. This is because some steps in IRFinder is on a single core and uses a lot of memory.

Let me know how this goes.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/williamritchie/IRFinder/issues/83#issuecomment-614931900, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKLOVFJTWUUWETG4Z4ZXW7DRM6B7FANCNFSM4MI5AL4A.

qicaibiology commented 4 years ago

I went to see the comparison using no replicates and found that in the generated folder there is no txt file like:

IRFinder-IR-dir.txt

​The files included in are like this:

[caiqi@midway2-login1 SRR645846_gz_irfinder]$ ls

IRFinder-ChrCoverage.txt IRFinder-SpansPoint.txt Log.std.out irfinder.stderr

IRFinder-IR-nondir.txt Log.final.out SJ.out.tab irfinder.stdout

IRFinder-JuncCount.txt Log.out Unsorted.bam trim.log

IRFinder-ROI.txt Log.progress.out WARNINGS unsorted.frag.bam

Size wise:

[caiqi@midway2-login1 SRR645846_gz_irfinder]$ ls -lh

total 6.8G

-rw-rw-r-- 1 caiqi caiqi 536 Apr 16 19:42 IRFinder-ChrCoverage.txt

-rw-rw-r-- 1 caiqi caiqi 22M Apr 16 19:42 IRFinder-IR-nondir.txt

-rw-rw-r-- 1 caiqi caiqi 7.7M Apr 16 19:42 IRFinder-JuncCount.txt

-rw-rw-r-- 1 caiqi caiqi 31K Apr 16 19:42 IRFinder-ROI.txt

-rw-rw-r-- 1 caiqi caiqi 6.6M Apr 16 19:42 IRFinder-SpansPoint.txt

-rw-rw-r-- 1 caiqi caiqi 1.9K Apr 16 19:42 Log.final.out

-rw-rw-r-- 1 caiqi caiqi 21K Apr 16 19:42 Log.out

-rw-rw-r-- 1 caiqi caiqi 3.3K Apr 16 19:42 Log.progress.out

-rw-rw-r-- 1 caiqi caiqi 158 Apr 16 19:42 Log.std.out

-rw-rw-r-- 1 caiqi caiqi 6.1M Apr 16 19:42 SJ.out.tab

-rw-rw-r-- 1 caiqi caiqi 2.7G Apr 16 19:42 Unsorted.bam

-rw-rw-r-- 1 caiqi caiqi 0 Apr 16 19:42 WARNINGS

-rw-rw-r-- 1 caiqi caiqi 0 Apr 16 19:13 irfinder.stderr

-rw-rw-r-- 1 caiqi caiqi 1.3K Apr 16 19:42 irfinder.stdout

-rw-rw-r-- 1 caiqi caiqi 709 Apr 16 19:37 trim.log

-rw-rw-r-- 1 caiqi caiqi 4.1G Apr 16 19:42 unsorted.frag.bam


From: Cai Qi caiqi@uchicago.edu Sent: Thursday, April 16, 2020 9:47 PM To: williamritchie/IRFinder IRFinder@noreply.github.com; williamritchie/IRFinder reply@reply.github.com Cc: Mention mention@noreply.github.com Subject: Re: [williamritchie/IRFinder] Files missing in output folder (#83)

Hi Dadi:

I have ran two fastq.gz and the bam file is like this:

[caiqi@midway2-login2 SRR645846_gz_irfinder]$ ls -ll Unsorted.bam

-rw-rw-r-- 1 caiqi caiqi 2855832356 Apr 16 19:42 Unsorted.bam

however, when I was about to pool the 2 replicates (SRR645849) together, I still get the same error with yesterday's.

[caiqi@midway2-login2 IRFinder-1.2.6_hotfix_R_bug]$ srun -c 1 --mem-per-cpu=100G -n 1 -N 1 -p bigmem2 bin/IRFinder -m BAM -r REF/Mouse-GRCm38/ -d pooled_div7/ <(samtools cat SRR645846_gz_irfinder/Unsorted.bam SRR645849_gz_irfinder/Unsorted.bam) >log1 2>error1 &

[caiqi@midway2-login2 IRFinder-1.2.6_hotfix_R_bug]$ cat error1

srun: job 122613 queued and waiting for resources

srun: job 122613 has been allocated resources

gzip: /dev/fd/63.gz: No such file or directory

Illegal division by zero at /scratch/midway2/caiqi/IRFinder_1/IRFinder-1.2.6_hotfix_R_bug/bin/util/warnings line 207.

srun: error: midway2-bigmem01: task 0: Exited with exit code 255

I went to /bin/util/warnings but hard to tell which line is 207. Sorry to bother you again. I checked it. There is nothing in irfinder.stderr or waringings.

Which part generated this bug?

Thanks a lot,

Cai


From: Cai Qi caiqi@uchicago.edu Sent: Thursday, April 16, 2020 6:25 PM To: williamritchie/IRFinder IRFinder@noreply.github.com; williamritchie/IRFinder reply@reply.github.com Cc: Mention mention@noreply.github.com Subject: Re: [williamritchie/IRFinder] Files missing in output folder (#83)

OK. It works well so far.

I am going to do it again using the downloaded data same with yours.

Thanks,

Cai


From: Dadi notifications@github.com Sent: Thursday, April 16, 2020 5:36 PM To: williamritchie/IRFinder IRFinder@noreply.github.com Cc: Cai Qi caiqi@uchicago.edu; Mention mention@noreply.github.com Subject: Re: [williamritchie/IRFinder] Files missing in output folder (#83)

Great. Warning here is OK. This also means you don't need to load GLIBC. In the src/trim folder, have you done

cp trim ../../bin/util/

? Do it if you haven't. This will copy the newly complied trim to overwrite the original one that comes with IRFinder. The new trim will be set according to your system and mitigate the incompatibility.

Similarly, you also need to go to src/irfinder and src/winflat and run make command in each of the two folders. This should generate irfinder and winflat inside each folder respectively. You also want to copy each of them to IRFinder-1.2.6_hotfix_R_bug/bin/util to overwrite the original files.

Let me know if any of the above step failed with error messages.

Now you can try to re-run IRFinder on your test sample. Here are two of my suggestions:

  1. If you can, try to download the sample from where I downloaded. You should get fastq.gz file from the EBI site I mentioned. Don't unzip them. You can directly feed zipped files to IRFinder.
  2. When you submit IRFinder job to your server, try to avoid using --mem-per-cpu option. I'm not familiar with your server, but there should be an option to set an overall memory instead of doing this per core. Set the overall memory to 64GB if you can. This is because some steps in IRFinder is on a single core and uses a lot of memory.

Let me know how this goes.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/williamritchie/IRFinder/issues/83#issuecomment-614931900, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKLOVFJTWUUWETG4Z4ZXW7DRM6B7FANCNFSM4MI5AL4A.

dg520 commented 4 years ago

Hi @qicaibiology , Three things I notice:

  1. Your Sample SRR645846 is successfully run, with all output generated correctly. This is good and a big progress, meaning you are able to run IRFinder per sample now.
  2. You will never see IRFinder-IR-dir.txt in the output folder. This is because IRFinder detects Sample SRR645846 as a non-directional RNASeq library (i.e. we're not sure whether it sequences a gene or this gene's antisense strand), so that IRFinder won't report IR in directional manner. This is described in the manual.
  3. For you pooling issue, I'm not sure what happened on your side. I don't have any problem here. Could you please first check if Sample SRR645849 is quantified successfully? The output folder should contain files with the following sizes (by ls -ll):
    -rw-rw----. 1 XXX XXX        533 Apr 17 00:50 IRFinder-ChrCoverage.txt
    -rw-rw----. 1 XXX XXX   22510901 Apr 17 00:50 IRFinder-IR-nondir.txt
    -rw-rw----. 1 XXX XXX    7991033 Apr 17 00:50 IRFinder-JuncCount.txt
    -rw-rw----. 1 XXX XXX      30919 Apr 17 00:50 IRFinder-ROI.txt
    -rw-rw----. 1 XXX XXX    6894915 Apr 17 00:50 IRFinder-SpansPoint.txt
    -rw-rw----. 1 XXX XXX          0 Apr 17 00:43 irfinder.stderr
    -rw-rw----. 1 XXX XXX       1426 Apr 17 00:50 irfinder.stdout
    -rw-rw----. 1 XXX XXX       1857 Apr 17 00:50 Log.final.out
    -rw-rw----. 1 XXX XXX      17898 Apr 17 00:50 Log.out
    -rw-rw----. 1 XXX XXX        718 Apr 17 00:50 Log.progress.out
    -rw-rw----. 1 XXX XXX        158 Apr 17 00:50 Log.std.out
    -rw-rw----. 1 XXX XXX    6309147 Apr 17 00:50 SJ.out.tab
    -rw-rw----. 1 XXX XXX        703 Apr 17 00:49 trim.log
    -rw-rw----. 1 XXX XXX 2661662284 Apr 17 00:50 Unsorted.bam
    -rw-rw----. 1 XXX XXX 4031322319 Apr 17 00:50 unsorted.frag.bam
    -rw-rw----. 1 XXX XXX          0 Apr 17 00:50 WARNINGS

    If everything looks the same for you, please try to run pooling in two separate steps: Step1:

    (samtools cat SRR645846/Unsorted.bam SRR645849/Unsorted.bam) > tmp.bam

    This will merge two BAM files into a temporary one, with file size of 5517420429. See if you can get there without error. If so, run Step 2:

    IRFinder -m BAM -r REF/Mouse-GRCm38 -d pooled_div7 tmp.bam

    A successful run will see the following files in the pooled_div7 folder:

    -rw-rw----. 1 XXX XXX        564 Apr 17 01:00 IRFinder-ChrCoverage.txt
    -rw-rw----. 1 XXX XXX   23341407 Apr 17 01:00 IRFinder-IR-nondir.txt
    -rw-rw----. 1 XXX XXX    8793682 Apr 17 01:00 IRFinder-JuncCount.txt
    -rw-rw----. 1 XXX XXX      30947 Apr 17 01:00 IRFinder-ROI.txt
    -rw-rw----. 1 XXX XXX    6931455 Apr 17 01:00 IRFinder-SpansPoint.txt
    -rw-rw----. 1 XXX XXX          0 Apr 17 00:52 irfinder.stderr
    -rw-rw----. 1 XXX XXX       1386 Apr 17 01:00 irfinder.stdout
    -rw-rw----. 1 XXX XXX 8361357781 Apr 17 01:00 unsorted.frag.bam
    -rw-rw----. 1 XXX XXX          0 Apr 17 01:00 WARNINGS

    Let me know how this two-step method works and post here with all error messages if you see any.

BTW, you can run IRFinder using multi-thread, which will be more effecient. Just make sure your memory allocation is an overall set, instead of per-core set. IRFinder usually needs a total memory of 48 - 64 GB.

qicaibiology commented 4 years ago

Hi Dadi:

Again, thanks a lot for your mentoring. Here is the fils:

[caiqi@midway2-login2 SRR645849_gz_irfinder]$ ls -ll

total 6579328

-rw-rw-r-- 1 caiqi caiqi 533 Apr 16 20:54 IRFinder-ChrCoverage.txt

-rw-rw-r-- 1 caiqi caiqi 22510901 Apr 16 20:54 IRFinder-IR-nondir.txt

-rw-rw-r-- 1 caiqi caiqi 7991033 Apr 16 20:54 IRFinder-JuncCount.txt

-rw-rw-r-- 1 caiqi caiqi 30919 Apr 16 20:54 IRFinder-ROI.txt

-rw-rw-r-- 1 caiqi caiqi 6894915 Apr 16 20:54 IRFinder-SpansPoint.txt

-rw-rw-r-- 1 caiqi caiqi 1856 Apr 16 20:54 Log.final.out

-rw-rw-r-- 1 caiqi caiqi 20674 Apr 16 20:54 Log.out

-rw-rw-r-- 1 caiqi caiqi 3314 Apr 16 20:54 Log.progress.out

-rw-rw-r-- 1 caiqi caiqi 158 Apr 16 20:54 Log.std.out

-rw-rw-r-- 1 caiqi caiqi 6309147 Apr 16 20:54 SJ.out.tab

-rw-rw-r-- 1 caiqi caiqi 2661630707 Apr 16 20:54 Unsorted.bam

-rw-rw-r-- 1 caiqi caiqi 0 Apr 16 20:54 WARNINGS

-rw-rw-r-- 1 caiqi caiqi 0 Apr 16 20:24 irfinder.stderr

-rw-rw-r-- 1 caiqi caiqi 1279 Apr 16 20:54 irfinder.stdout

-rw-rw-r-- 1 caiqi caiqi 704 Apr 16 20:49 trim.log

-rw-rw-r-- 1 caiqi caiqi 4031322295 Apr 16 20:54 unsorted.frag.bam

​Some are same some are not. But I tried it anyhow. It looks we are getting there.

I will keep you updated. Cai


From: Dadi notifications@github.com Sent: Friday, April 17, 2020 8:56 AM To: williamritchie/IRFinder IRFinder@noreply.github.com Cc: Cai Qi caiqi@uchicago.edu; Mention mention@noreply.github.com Subject: Re: [williamritchie/IRFinder] Files missing in output folder (#83)

Hi @qicaibiologyhttps://github.com/qicaibiology , Three things I notice:

  1. Your Sample SRR645846 is successfully run, with all output generated correctly. This is good and a big progress, meaning you are able to run IRFinder per sample now.
  2. You will never see IRFinder-IR-dir.txt in the output folder. This is because IRFinder detects Sample SRR645846 as a non-directional RNASeq library (i.e. we're not sure whether it sequences a gene or this gene's antisense strand), so that IRFinder won't report IR in directional manner. This is described in the manualhttps://github.com/williamritchie/IRFinder/wiki/IR-Quantification-Output.
  3. For you pooling issue, I'm not sure what happened on your side. I don't have any problem here. Could you please first check if Sample SRR645849 is quantified successfully? The output folder should contain files with the following size (by ls -ll):

-rw-rw----. 1 XXX XXX 533 Apr 17 00:50 IRFinder-ChrCoverage.txt -rw-rw----. 1 XXX XXX 22510901 Apr 17 00:50 IRFinder-IR-nondir.txt -rw-rw----. 1 XXX XXX 7991033 Apr 17 00:50 IRFinder-JuncCount.txt -rw-rw----. 1 XXX XXX 30919 Apr 17 00:50 IRFinder-ROI.txt -rw-rw----. 1 XXX XXX 6894915 Apr 17 00:50 IRFinder-SpansPoint.txt -rw-rw----. 1 XXX XXX 0 Apr 17 00:43 irfinder.stderr -rw-rw----. 1 XXX XXX 1426 Apr 17 00:50 irfinder.stdout -rw-rw----. 1 XXX XXX 1857 Apr 17 00:50 Log.final.out -rw-rw----. 1 XXX XXX 17898 Apr 17 00:50 Log.out -rw-rw----. 1 XXX XXX 718 Apr 17 00:50 Log.progress.out -rw-rw----. 1 XXX XXX 158 Apr 17 00:50 Log.std.out -rw-rw----. 1 XXX XXX 6309147 Apr 17 00:50 SJ.out.tab -rw-rw----. 1 XXX XXX 703 Apr 17 00:49 trim.log -rw-rw----. 1 XXX XXX 2661662284 Apr 17 00:50 Unsorted.bam -rw-rw----. 1 XXX XXX 4031322319 Apr 17 00:50 unsorted.frag.bam -rw-rw----. 1 XXX XXX 0 Apr 17 00:50 WARNINGS

If everything looks the same for you, please try to run pooling in two separate step: Step1:

(samtools cat SRR645846/Unsorted.bam SRR645849/Unsorted.bam) > tmp.bam

This will merge two BAM files into a temporary one, with file size of 5517420429. See if you can get there without error. If so, run Step 2:

IRFinder -m BAM -r REF/Mouse-GRCm38 -d pooled_div7 tmp.bam

A successful run will see the following files in the pooled_div7 folder:

-rw-rw----. 1 XXX XXX 564 Apr 17 01:00 IRFinder-ChrCoverage.txt -rw-rw----. 1 XXX XXX 23341407 Apr 17 01:00 IRFinder-IR-nondir.txt -rw-rw----. 1 XXX XXX 8793682 Apr 17 01:00 IRFinder-JuncCount.txt -rw-rw----. 1 XXX XXX 30947 Apr 17 01:00 IRFinder-ROI.txt -rw-rw----. 1 XXX XXX 6931455 Apr 17 01:00 IRFinder-SpansPoint.txt -rw-rw----. 1 XXX XXX 0 Apr 17 00:52 irfinder.stderr -rw-rw----. 1 XXX XXX 1386 Apr 17 01:00 irfinder.stdout -rw-rw----. 1 XXX XXX 8361357781 Apr 17 01:00 unsorted.frag.bam -rw-rw----. 1 XXX XXX 0 Apr 17 01:00 WARNINGS

Let me know how this two-step method works and post here with all error messages if you see any.

BTW, you can run IRFinder using multi-thread, which will be more effecient. Just make sure your memory allocation is an overall set, instead of per-core set. IRFinder usually needs a total memory of 48 - 64 GB.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/williamritchie/IRFinder/issues/83#issuecomment-615258145, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKLOVFKSFIQV67MTI7PSLWLRNBNXHANCNFSM4MI5AL4A.

qicaibiology commented 4 years ago

Hi Dadi@dg520

Eventually it went through! Thank you so much for the patient and detailed mentoring on this package. I saw some genes which I know should be selected.

I have 2 extra question: can IRFinder do multiple conditions more than 3 like different developmental stages: DIV0, DIV1, DIV7, for example. During the trouble shooting process you examined if the fasta file is in accordance with gtf file, how to examine that. That is the reason for my previous mapping I only do download from iGenomes because they are included in the same folder. Thanks you again for your generous help and I appreciate a lot.

Cai

dg520 commented 4 years ago

Hi @qicaibiology , I'm glad things have turned out good. For you questions:

  1. IRFinder can do multiple conditions using the GLM approach. See manual for more details. You have to have at least 3 samples per condition though.

  2. To extract chromosome names from FASTA, you can use

    grep ">" genome.fa|awk 'BEGIN{FS=" "}{gsub(">","",$1);print $1}'|sort|uniq

    To extract chromosomes in GTF, you can

    cut -f1 transcripts.gtf|sort|uniq

    In Linux, there are many commands, such as diff and comm, to compare the contents in two lists and tell you where are the difference between them. Please read Linux manual,

  3. I strongly encourage you to read more about how to use the job submitting system on your server. It's critical to allocate resources in a correct and robust way. And it is equally important to be aware of warnings and error messages produced by analysis tools. Otherwise, those tools might either fail or lead you to a quite wrong result. Consulting your server administrator or bioinformatician colleagues is definitely helpful.

Would you mind close this thread if your problem has been solved? I really appreciate your patience and quick response during the troubleshooting process.

Best, Dadi

qicaibiology commented 4 years ago

Sure! I will! Thanks!

Thanks,

Cai

Sent from my iPhone

On Apr 17, 2020, at 1:20 PM, Dadi notifications@github.com wrote:



Hi @qicaibiologyhttps://github.com/qicaibiology , I'm glad things have turned out good. For you questions:

  1. IRFinder can do multiple conditions using the GLM approach. See manual for more details. You have to have at least 3 samples per condition though.

  2. To extract chromosome names from FASTA, you can use

grep ">" genome.fa|awk 'BEGIN{FS=" "}{gsub(">","",$1);print $1}'|sort|uniq

To extract chromosomes in GTF, you can

cut -f1 transcripts.gtf|sort|uniq

In Linux, there are many commands, such as diff and comm, to compare the contents in two lists and tell you where are the difference between them. Please read Linux manual,

  1. I strongly encourage you to read more about how to use the job submitting system on your server. It's critical to allocate resources in a correct and robust way. And it is equally important to be aware of warnings and error messages produced by analysis tools. Otherwise, those tools might either fail or lead you to a quite wrong result. Consulting your server administrator or bioinformatician colleagues is definitely helpful.

Would you mind close this thread if your problem has been solved? I really appreciate your patience and quick response during the troubleshooting process.

Best, Dadi

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/williamritchie/IRFinder/issues/83#issuecomment-615393869, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKLOVFPQFDFA55UINNOVTJLRNCMXTANCNFSM4MI5AL4A.