icbi-lab / NeoFuse

NeoFuse is a user-friendly pipeline for the prediction of fusion neoantigens from tumor RNA-seq data.
GNU General Public License v3.0
17 stars 9 forks source link

Error occured during OptiType run #3

Closed aditisk closed 4 years ago

aditisk commented 4 years ago

Hello,

I'm running into an error with OptiType. I'm pasting the lines from the R.optitype.log file below:

0:00:01.97 Generating binary hit matrix. Traceback (most recent call last): File "/usr/local/bin/OptiType-1.3.2/OptiTypePipeline.py", line 357, in pos, read_details = ht.pysam_to_hdf(bam_paths[0]) File "/usr/local/bin/OptiType-1.3.2/hlatyper.py", line 186, in pysam_to_hdf sam = pysam.AlignmentFile(samfile, sam_or_bam) File "pysam/calignmentfile.pyx", line 311, in pysam.calignmentfile.AlignmentFile.cinit (pysam/calignmentfile.c:4929) File "pysam/calignmentfile.pyx", line 510, in pysam.calignmentfile.AlignmentFile._open (pysam/calignmentfile.c:7138) ValueError: file header is empty (mode='rb') - is it SAM/BAM format?

How can I fix this ? Thanks.

aditisk commented 4 years ago

I am also attaching the stack trace from the error file of my job submitted to the cluster:

Couldn't create temporary file /scratch/slurm-1124957/SQNlHrx19. (No such file or directory) /usr/local/bin/seqan/include/seqan/file/string_mmap.h:635 FAILED! (Memory Mapped String couldn't open temporary file)

stack trace: 0 [0x55876c16255e] seqan::ClassTest::fail() + 0xe 1 [0x55876c267824] yara_mapper(+0x341824) 2 [0x55876c2b0e17] bool seqan::_remap<unsigned long, seqan::MMapConfig<seqan::File<seqan::Async >, unsigned long>, unsigned long>(seqan::String<unsigned long, seqan::MMap<seqan::MMapConfig<seqan::File<seqan::Async >, unsigned long> > >&, unsigned long) + 0x47 3 [0x55876c2b2504] SeqStore<void, YaraContigsConfig<seqan::MMap<seqan::MMapConfig<seqan::File<seqan::Async >, unsigned long> > > >::SeqStore() + 0x124 4 [0x55876c135b2b] yara_mapper(+0x20fb2b) 5 [0x55876c03bc15] main + 0x5c5 6 [0x7ff665109b97] __libc_start_main + 0xe7 7 [0x55876c03beca] _start + 0x2a

abyssum commented 4 years ago

Hello @aditisk ,

Could you:

  1. Give the command you use to call NeoFuse with.
  2. Check if there is a BAM file under the dir /path/to/output_dir/sample_ID/OptiType/temp/
  3. If there is a BAM file, could you do a $ head /path/to/output_dir/sample_ID/OptiType/temp/$FILE_mapped_1.bam and paste the result.

From the looks of it, YARA cannot create a temp file probably because your system doesn't have a /temp/ directory (this is something I will probably have to change in the next version of NeoFuse).

As a quick-fix for now, try to export a temp dir variable to your working environment (this is something you would need to do everytime your shell will flash your env variables, so I would suggest to either modify your .bashrc or include the export command before your NeoFuse call in your script):

$ export TEMPDIR=/path/to/tempDir/

Also make sure you have access to the directory you are assigning as $TEMPDIR (for example you can use your scratch space).

Let me know if that works.

aditisk commented 4 years ago

Hi @abyssum,

Thank you for the prompt response. I changed the TEMPDIR variable to my working environment and I am still getting the same error as before. Here are my responses to what you need:

  1. Here is my command:

/zfs1/rferris/adk85/Tools/NeoFuse_files/NeoFuse \ -1 /zfs1/rferris/adk85/Tools/NeoFuse_files/test/R_9080_T_ZAN174A15_S46_R1_001.fastq \ -2 /zfs1/rferris/adk85/Tools/NeoFuse_files/test/R_9080_T_ZAN174A15_S46_R2_001.fastq \ -s /zfs1/rferris/adk85/Tools/NeoFuse_files/STAR_idx \ -g /bgfs/genomics/refs/igenomes/Homo_sapiens/GATK/GRCh38/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.fasta \ -a /bgfs/genomics/fangping/nfcore/rnaseq/refs/GRCh38/Homo_sapiens.GRCh38.96.gtf \ --singularity \ -o /zfs1/rferris/adk85/Tools/NeoFuse_files/test

  1. The BAM file exists at this location: /zfs1/rferris/adk85/Tools/NeoFuse_files/test/R/OptiType/tmp

  2. Here is the output of the head command: [adk85@login0b tmp]$ head R_mapped_1.bam ?BC!sr?ed?|?t? [adk85@login0b tmp]$

Thanks for your help with troubleshooting.

abyssum commented 4 years ago

Hello @aditisk,

So the BAM file is produced and it's in binary format. But from your command I can see that you are using a whole-genome reference FASTA file and an RNAseq GTF file, is there a specific reason for that?

I can also see that you are using the STAR index created by NeoFuse (presumably you used NeoFuse -R command to download and index the reference genome). If that is the case, under the /NeoFuse_files/ directory, there should be two files:

/gencode.v31.annotation.gtf
/GRCh38.primary_assembly.genome.fa

Could you delete your output folder (just to make sure there are no file colisions) and change your command to something like:

/zfs1/rferris/adk85/Tools/NeoFuse_files/NeoFuse \
-1 /zfs1/rferris/adk85/Tools/NeoFuse_files/test/R_9080_T_ZAN174A15_S46_R1_001.fastq \
-2 /zfs1/rferris/adk85/Tools/NeoFuse_files/test/R_9080_T_ZAN174A15_S46_R2_001.fastq \
-s /zfs1/rferris/adk85/Tools/NeoFuse_files/STAR_idx \
-g /zfs1/rferris/adk85/Tools/NeoFuse_files/GRCh38.primary_assembly.genome.fa \
-a /zfs1/rferris/adk85/Tools/NeoFuse_files/gencode.v31.annotation.gtf  \
--singularity \
-o /zfs1/rferris/adk85/Tools/NeoFuse_files/test

Let me know if that works.

Could you also do a

$ head /zfs1/rferris/adk85/Tools/NeoFuse_files/test/R_9080_T_ZAN174A15_S46_R1_001.fastq

and paste the result?

aditisk commented 4 years ago

Hi again @abyssum,

I was using that FASTA reference because I was using the same reference file for variant calling on RNAseq reads. It is the GATK version which has virus sequences and some other annotations. Does that make a difference ?

As per your suggestion, I changed the files to use the ones created by Neofuse -R but I'm still running into the exact same error.

Our system admin told me that the variable for the temp directory is TMPDIR (as opposed to TEMPDIR) but that doesn't make any difference either.

Here is the output from my reads. I have used the same files for other analysis and they worked fine so I don't think the issue is linked to these. What do you think ?

$ head /zfs1/rferris/adk85/Tools/NeoFuse_files/test/R_9080_T_ZAN174A15_S46_R1_001.fastq @A00522:58:HKMHWDMXX:1:1101:1217:1000 1:N:0:GAATTCGT+TAATCTTA CNCGTACATGGACATGGGCCTACTGCCCCGGGCAGACGGACGTCTCTTTAAGGAAGAGTGGTTGGAAGTGTCTGTGTACTCAGAACCAGTTTGCACCTGAT + F#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF @A00522:58:HKMHWDMXX:1:1101:1344:1000 1:N:0:GAATTCGT+TAATCTTA ANTCATCCAATCCAGCATCCTCAGTATCTTCTTCTTCTTCTGGTTCAACAGGAGGTGGTGTCTCTTCCTTTTCTGGTACTCCTTGTTCCATAACTTCTACA + F#FFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF @A00522:58:HKMHWDMXX:1:1101:4652:1000 1:N:0:GAATTCGT+TAATCTTA CNTGAGTTTTGCCAGCCCGCGTGCGCGCCTGCTGTTCCGGGGCCGAATGGGAGCAGATCTTCTTGGCCAGGCTCTGCAGGTAGGCGTCCTTGGCGAGTAGA

abyssum commented 4 years ago

Hello @aditisk,

Regarding the FASTA file, no it shouldn't affect this step since OptiType uses its own reference for HLA typing. But you might have issues when it comes to gene fusion calling from arriba and assigning genes to mapped reads with featureCounts if the FASTA and GTF files don't match (there are transcripts in the GTF not found in the FASTA file etc.).

I managed to reproduce your error by removing the header from a BAM file produced by yara_mapper. Could you do a

$ samtools view -h /path/to/temp/R_mapped_1.bam | head
$ samtools flagstat /path/to/temp/R_mapped_1.bam

And paste the result?

If that is the case, for some reason YARA returns an empty/truncated BAM file (I will check what can cause this behaviour in the meantime).

aditisk commented 4 years ago

Hi @abyssum,

Good to know the specifics about matched FASTA and GTF files, will keep that in mind.

Here are the samtools outputs:

samtools view -h R_mapped_1.bam | head output is blank

[adk85@login0b tmp]$ samtools flagstat R_mapped_1.bam 0 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 secondary 0 + 0 supplementary 0 + 0 duplicates 0 + 0 mapped (N/A : N/A) 0 + 0 paired in sequencing 0 + 0 read1 0 + 0 read2 0 + 0 properly paired (N/A : N/A) 0 + 0 with itself and mate mapped 0 + 0 singletons (N/A : N/A) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)

aditisk commented 4 years ago

Hi @abyssum,

Our system admin/bioinformatics support person got back to me. He has identified and fixed the problem. Here is what he said:

NeoFuse used singularity to run various tools. NeoFuse has not specified the bind paths of the temp directory in singularity. I did a specific fix to bind /scratch. If you prefer a permanent fix, you can contact the code developers to specify the bind path of the temp directory in singularity.

I hope this helps in making the necessary changes to NeoFuse.

On a side note, a question about the interpretation of the output files:

For the sample mentioned above, the filtered file has only 1 Neoantigen and the unfiltered file has 106. What is a good IC50/rank cut-off ? How lenient can one be with these cut-offs ?

Thanks for your help.

abyssum commented 4 years ago

Hello @aditisk,

Sorry for the late reply, I am glad it worked out eventually.

Yes, your sysadmin is right (that's what I was trying to address with the export cmd and it is planned to be implemented in the next update), thanks for pointing this out.

Regarding your question, the general thresholds for predicting intermediate-strong binders would be: IC50 < 500 (that's NeoFuse default) Rank < 2 (NeoFuse default is Inf)

I would say that's as lenient as it gets, anything with an IC50 > 500 nM and/or Rank > 2 is considered a weak binder. You can always filter only for percentile rank (Rank < 2) and set IC50 = Inf with -t Inf -T 2 options.

Note that by default NeoFuse will search for Neoantigens with peptide length of 8 a.a. You might want to consider setting -m 8 -M 11 so you can get epitopes of length 8,9,10 and 11 as well.

I will close the issue, please feel free to reopen if you encounter a similar error.

aditisk commented 4 years ago

Thank you so much for your feedback.

aditisk commented 4 years ago

Hi @abyssum,

Just thought I'd let you know that the unfiltered.tsv file is missing 2 column headers. I assume they should be Breakpoint1 and Breakpoint2 (based on the filtered file). As a result of missing headers, if you open the sheet in excel, all the remaining column headers are shifted and hence don't align with the actual data in that column.

abyssum commented 4 years ago

Hello @aditisk,

Thank you for reporting this to me. I'll try to make a hotfix before the next update of NeoFuse.