Closed Squirrelity closed 1 year ago
Hello,thank to your amazing tool again. I tried to solve the problem, but now I still can't run your tool. I think most of the problems are because of the version of GSNAP. I delete the str "'--ignore-trim-in-filtering','1'" in tRNAmap.py ,because the GSNAP version 2021-03-08. Now it has the following error(align.log file):
GSNAP version 2021-03-08 called with args: gsnap.avx2 --gunzip -D hg38_HEK239vsK562_1/Hsap_tRNAgenome -d Hsap_tRNAgenome -V hg38_HEK239vsK562_1/Hsapsnp_index -v hg38_test_modificationSNPs -t 15 --split-output hg38_HEK239vsK562_1/mimseq_hek_1 --format sam --genome-unk-mismatch 0 --md-lowercase-snp --max-mismatches 0.1 ./mim-tRNAseq/demo/mim-tRNAseq-master/mimseq_hek_1.fastq.gz
Neither novel splicing (-N) nor known splicing (-s) turned on => assume reads are DNA-Seq (genomic)
Checking compiler assumptions for SSE2: 6B8B4567 327B23C6 xor=59F066A1
Checking compiler assumptions for SSE4.1: -103 -58 max=198 => compiler zero extends
Checking compiler assumptions for SSE4.2 options: 6B8B4567 __builtin_clz=1 __builtin_ctz=0 _mm_popcnt_u32=17 __builtin_popcount=17
Finished checking compiler assumptions
Allocating memory for compressed genome (oligos)...done (9,224 bytes, 0.00 sec)
Allocating memory for compressed genome (bits)...done (9,264 bytes, 0.00 sec)
Allocating memory for compressed genome (oligos)...done (9,216 bytes, 0.00 sec)
Allocating memory for compressed genome (bits)...done (9,264 bytes, 0.00 sec)
Looking for genome Hsap_tRNAgenome in directory hg38_HEK239vsK562_1/Hsap_tRNAgenome
Cannot find genomic index files in either current or old format. Looking for files containing ref
I tried to find the version of GSNAP-2019-02-26 but failed in http://research-pub.gene.com/gmap/. Could you do me a favor or upload the version of GSNAP-2019-02-26? Thank you very much.
Hi there,
Sorry for the delayed response! Indeed your issues are because of the GSNAP version. This should have automatically been handled on installation if you used conda/mamba. It needs to be version GSNAP-2019-02-26, which conda installs, but perhaps you installed GSNAP manually?
The correct version is available through conda, and you can install it using the following command: conda install -c bioconda gmap=2019.02.26
Thanks to your reply! Now I can install the version of gmap-2019.02.26.
gsnap --version
Note: /home/owner/miniconda3/envs/mimseq/pkgs/gmap-2019.02.26-pl526h2f06484_0/bin/gsnap.avx2 does not exist. For faster speed, may want to compile package on an AVX2 machine
GSNAP version 2019-02-26 called with args: /home/owner/miniconda3/envs/mimseq/pkgs/gmap-2019.02.26-pl526h2f06484_0/bin/gsnap.sse42 --version
GSNAP: Genomic Short Nucleotide Alignment Program
Part of GMAP package, version 2019-02-26
Build target: x86_64-unknown-linux-gnu
Features: pthreads enabled, no alloca, zlib available, mmap available, littleendian, sigaction available, 64 bits available
Popcnt: mm_popcnt builtin_popcount
Builtin functions: builtin_clz builtin_ctz builtin_popcount
SIMD functions compiled: SSE2 SSSE3 SSE4.1 SSE4.2
Sizes: off_t (8), size_t (8), unsigned int (4), long int (8), long long int (8)
Default gmap directory (compiled): /opt/conda/conda-bld/gmap_1592898626837/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/share
Default gmap directory (environment): /opt/conda/conda-bld/gmap_1592898626837/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/share
Maximum stack read length: 300
Thomas D. Wu, Genentech, Inc.
Contact: twu@gene.com
However, there are still some problems when I run mimseq.
Traceback (most recent call last):
File "/home/owner/miniconda3/envs/mimseq/bin/mimseq", line 8, in <module>
sys.exit(main())
File "/home/owner/miniconda3/envs/mimseq/lib/python3.7/site-packages/mimseq/mimseq.py", line 411, in main
args.misinc_thresh, args.mito, args.plastid, args.pretrnas, args.local_mod, args.p_adj, args.sampledata)
File "/home/owner/miniconda3/envs/mimseq/lib/python3.7/site-packages/mimseq/mimseq.py", line 108, in mimseq
snp_index_path, snp_index_name, out, threads, snp_tolerance, keep_temp, mismatches, map_round, remap)
File "/home/owner/miniconda3/envs/mimseq/lib/python3.7/site-packages/mimseq/tRNAmap.py", line 48, in mainAlign
unique_bam, librarySize, alignstats = mapReads(fq, genome_index_path, genome_index_name, snp_index_path, snp_index_name, threads, out_dir, snp_tolerance, keep_temp, mismatches, remap)
File "/home/owner/miniconda3/envs/mimseq/lib/python3.7/site-packages/mimseq/tRNAmap.py", line 115, in mapReads
subprocess.check_call(map_cmd, stderr = open(out_dir + "align.log", "a"))
File "/home/owner/miniconda3/envs/mimseq/lib/python3.7/subprocess.py", line 363, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/home/owner/miniconda3/envs/mimseq/pkgs/gmap-2019.02.26-pl526h2f06484_0/bin/gsnap', '--gunzip', '-D', 'hg38_HEK239vsK562_1/Hsap_tRNAgenome', '-d', 'Hsap_tRNAgenome', '-V', 'hg38_HEK239vsK562_1/Hsapsnp_index', '-v', 'hg38_test_modificationSNPs', '-t', '15', '--split-output', 'hg38_HEK239vsK562_1/mimseq_k562_2', '--format', 'sam', '--genome-unk-mismatch', '0', '--md-lowercase-snp', '--ignore-trim-in-filtering', '1', '--max-mismatches', '0.1', '/home/owner/download/mim-tRNAseq/demo/mim-tRNAseq-master/mimseq_k562_2.fastq.gz']' died with <Signals.SIGSEGV: 11>.
So I read align.log, this is the tail message:
Note: /home/owner/miniconda3/envs/mimseq/pkgs/gmap-2019.02.26-pl526h2f06484_0/bin/gsnap.avx2 does not exist. For faster speed, may want to compile package on an AVX2 machine
GSNAP version 2019-02-26 called with args: /home/owner/miniconda3/envs/mimseq/pkgs/gmap-2019.02.26-pl526h2f06484_0/bin/gsnap.sse42 --gunzip -D hg38_HEK239vsK562_1/Hsap_tRNAgenome -d Hsap_tRNAgenome -V hg38_HEK239vsK562_1/Hsapsnp_index -v hg38_test_modificationSNPs -t 15 --split-output hg38_HEK239vsK562_1/mimseq_k562_2 --format sam --genome-unk-mismatch 0 --md-lowercase-snp --ignore-trim-in-filtering 1 --max-mismatches 0.1 /home/owner/download/mim-tRNAseq/demo/mim-tRNAseq-master/mimseq_k562_2.fastq.gz
Neither novel splicing (-N) nor known splicing (-s) turned on => assume reads are DNA-Seq (genomic)
Checking compiler assumptions for SSE2: 6B8B4567 327B23C6 xor=59F066A1
Checking compiler assumptions for SSE4.1: -103 -58 max=198 => compiler zero extends
Checking compiler assumptions for SSE4.2 options: 6B8B4567 __builtin_clz=1 __builtin_ctz=0 _mm_popcnt_u32=17 __builtin_popcount=17
Finished checking compiler assumptions
Allocating memory for compressed genome (oligos)...Attached new memory for hg38_HEK239vsK562_1/Hsap_tRNAgenome/Hsap_tRNAgenome.genomecomp...done (9,224 bytes, 0.00 sec)
Allocating memory for compressed genome (oligos)...Attached new memory for hg38_HEK239vsK562_1/Hsapsnp_index/Hsap_tRNAgenome.genomecomp.hg38_test_modificationSNPs...done (9,216 bytes, 0.00 sec)
Allocating memory for compressed genome (bits)...Attached new memory for hg38_HEK239vsK562_1/Hsap_tRNAgenome/Hsap_tRNAgenome.genomebits128...done (9,264 bytes, 0.00 sec)
Allocating memory for compressed genome (bits)...Attached new memory for hg38_HEK239vsK562_1/Hsapsnp_index/Hsap_tRNAgenome.genomebits128.hg38_test_modificationSNPs...done (9,264 bytes, 0.00 sec)
Looking for index files in directory hg38_HEK239vsK562_1/Hsapsnp_index
Pointers file is Hsap_tRNAgenome.ref151offsets64meta.hg38_test_modificationSNPs
Offsets file is Hsap_tRNAgenome.ref151offsets64strm.hg38_test_modificationSNPs
Positions file is Hsap_tRNAgenome.ref151positions.hg38_test_modificationSNPs
Offsets compression type: bitpack64
Allocating memory for ref (hg38_test_modificationSNPs) offset pointers, kmer 15, interval 1...Attached new memory for hg38_HEK239vsK562_1/Hsapsnp_index/Hsap_tRNAgenome.ref151offsets64meta.hg38_test_modificationSNPs...done (134,217,744 bytes, 0.10 sec)
Allocating memory for ref (hg38_test_modificationSNPs) offsets, kmer 15, interval 1...Attached new memory for hg38_HEK239vsK562_1/Hsapsnp_index/Hsap_tRNAgenome.ref151offsets64strm.hg38_test_modificationSNPs...done (361,840 bytes, 0.00 sec)
Allocating memory for ref (hg38_test_modificationSNPs) positions, kmer 15, interval 1...Attached new memory for hg38_HEK239vsK562_1/Hsapsnp_index/Hsap_tRNAgenome.ref151positions.hg38_test_modificationSNPs...done (139,292 bytes, 0.00 sec)
Reading SNPs file hg38_HEK239vsK562_1/Hsapsnp_index/hg38_test_modificationSNPs...done
Starting alignment. Writing results to hg38_HEK239vsK562_1/mimseq_k562_2.*
Length 48 of quality score differs from length 89 of nucleotides in sequence NB500982:157:HYLVMAFXY:2:21207:10311:17343
Signal received: SIGABRT
Calling Access_emergency_cleanup
Removed existing memory for shmid 3506182
Removed existing memory for shmid 3473413
Removed existing memory for shmid 3440644
Removed existing memory for shmid 3407875
Removed existing memory for shmid 3375106
Removed existing memory for shmid 3342337
Removed existing memory for shmid 3309568
You may want to run 'ipcs -m' to see if any shared memory segments are still in use
You can remove a shared memory segment manually by doing 'ipcrm -m <shmid>'
It looks like my computer doesn't have enough memory segments. I am not sure. How can I deal with it? Thanks for your reply.
Hi there,
So I was able to recreate the error based on something interesting in the log that I noticed before the error.
Starting alignment. Writing results to hg38_HEK239vsK562_1/mimseq_k562_2.* Length 48 of quality score differs from length 89 of nucleotides in sequence NB500982:157:HYLVMAFXY:2:21207:10311:17343 Signal received: SIGABRT
This is indicating that in the fastq file for mimseq_k562_2, one of the sequences has a quality score that does not match the length of the sequence. When I manually change the length of the quality score for a sequence in one of the inputs I get the same error. I think this specific input is truncated or corrupted in some way. Please download all 4 inputs again and give it another go! Hope that helps.
Thank you for your kind guidance.
With your help, I succeeded in running mimseq.
However, I still have some questions need to be answered.
I downloaded Hsap_HEK293T_42C_1h.fastq and Hsap_HEK293T_60C_1h.fastq as case data.
I used this barcodes.fa to cutadapt the samples.
`# Example barcodes.fa file for demultiplexing
I1
GATATCGTCA
I2
GATAGCTACA
I3
GATGCATACA
I4
GATTCTAGCA`
Actually I run the following commands. I got trim.fastq.gz and trimFinal.fastq.gz(Maybe I needn't to run the first one?) `> for i in ∗.fastq.gz
do fn=$(basename $i .fastq.gz)
cutadapt --no-indels -q 30,30 --trimmed-only -j 10 -a file:barcodes.fa -m 10 -o $fn'_{name}_trim.fastq.gz' $i 1> $fn'_log.txt'
done`
# Trim 2× 5ʹ random nucleotides
> for i in ∗trim∗.fastq.gz
do fn=$(basename $i trim.fastq.gz)
cutadapt -j 10 -m 10 -u 2 -o $fn'_trimFinal.fastq.gz' $i
done
I got four trimFinal.fastq.gz each sample so I used `cat 42C_1h.fastq.gz >>merge_Hsap_HEK293T_42C_1h.fastq.gz ` to merge them.
Then I created and save data tab-separated text file.
/home/owner/download/mim-tRNAseq/trim/output/merge_Hsap_HEK293T_42C_1h.fastq.gz HEK293T_42C /home/owner/download/mim-tRNAseq/trim/output/merge_Hsap_HEK293T_60C_1h.fastq.gz HEK293T_60C /home/owner/download/mim-tRNAseq/trim/output/merge_Hsap_HEK293T_rep1.fastq.gz HEK293T /home/owner/download/mim-tRNAseq/trim/output/merge_Hsap_HEK293T_rep2.fastq.gz HEK293T
And I run the command minseq
mimseq --species Hsap --cluster-id 0.97 --threads 15 --min-cov 0.0005 --max-mismatches 0.1 --control-condition HEK293T -n hg38_ --out-dir hg38_HEK293vs42c60c --max-multi 4 --remap --remap-mismatches 0.075 ../trim/output/HEK293T_42c60c.txt
Errors occur during "Differential expression analysis with DESeq2"
Error in estimateSizeFactorsForMatrix(counts(object), locfunc = locfunc, :
every gene contains at least one zero, cannot compute log geometric means
Calls: estimateSizeFactors ... estimateSizeFactors -> .local -> estimateSizeFactorsForMatrix
In addition: There were 21 warnings (use warnings() to see them)
Execution halted
I am puzzle. I think maybe I can't put two samples(42C_1h\60C1h) as input? Or should't I run the command of `cutadapt --no-indels -q 30,30 --trimmed-only -j 10 -a file:barcodes.fa -m 10 -o $fn'{name}_trim.fastq.gz' $i 1> $fn'_log.txt'`?
I compare the Hsap_HEK293T_rep1.fastq.gz files (one is from your provision in github and one is from my disposion. This is yours:
-rw-rw-r-- 1 owner owner 54618544 12月 2 10:25 mimseq_hek_1.fastq.gz
-rw-rw-r-- 1 owner owner 53748386 12月 2 10:26 mimseq_hek_2.fastq.gz
And this is mine:
-rw-rw-r-- 1 owner owner 1576774 11月 27 22:15 merge_Hsap_HEK293T_rep1.fastq.gz
-rw-rw-r-- 1 owner owner 900885 11月 27 22:15 merge_Hsap_HEK293T_rep2.fastq.gz
It is obviously different. How can I deal with them? Thanks for your reply.
You should not trim these files at all. All of the sequence read files in the GEO repository have already been trimmed. You can start by directly running mimseq on those files. I think the above error comes from very little to no data for many tRNAs, because of low read alignment.
Thank you for you help. Sorry for the delayed response.I am so excited last week. In summary, I successfully run mim-seq and I am so amazed at the results. I am struggled to understand the results... In the end, Thank to your excellent tools!
Thanks for your tool!! I have meet some problem when I run mimseq. The sampledata I used is:
I obtain the data from GSE152621 from your article and follow the steps to run mimseq:
mimseq --species Hsap --cluster-id 0.97 --threads 15 --min-cov 0.0005 --max-mismatches 0.1 --control-condition HEK293T -n hg38_ --out-dir hg38_HEK293Tvscase --max-multi 4 --remap --remap-mismatches 0.075 SampleData1.txt
Sadly It occur an Error and I don't know how to deal with it.
subprocess.CalledProcessError: Command '['snpindex', '-q', '1', '-D', 'hg38_HEK293Tvscase/Hsap_tRNAgenome', '-d', 'Hsap_tRNAgenome', '-V', 'hg38_HEK293Tvscase/Hsapsnp_index', '-v', 'hg38__modificationSNPs', 'hg38_HEK293Tvscase/Hsapsnp_index/hg38__modificationSNPs.iit']' died with <Signals.SIGSEGV: 11>.
I will appreciate your help.