Open zhouyjjj opened 11 months ago
Hi Tobias,
I also met a problem that says no RG:
Running slamDunk tcount for 1 files (12 threads)
joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
File "/public/slst/home/lict/anaconda3/lib/python3.8/site-packages/joblib/externals/loky/process_executor.py", line 431, in _process_worker
r = call_item()
File "/public/slst/home/lict/anaconda3/lib/python3.8/site-packages/joblib/externals/loky/process_executor.py", line 285, in call
return self.fn(*self.args, self.kwargs)
File "/public/slst/home/lict/anaconda3/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 595, in call
return self.func(*args, *kwargs)
File "/public/slst/home/lict/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 252, in call
return [func(args, kwargs)
File "/public/slst/home/lict/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 252, in
And this is what i generated filtered bam from STAR mapped files: A01890:220:H2CNYDSX7:4:1545:9498:5916 0 chr1 13026 255 150M 0 0 GCATGAAGGCTGTCAACCAGTCCATAGGCAAGCCTGGCTGCCTCCAGCTGGGTCGACAGACAGGGGCTGGAGAAGGGGAGAAGAGGAAAGTGAGGTTGCCTGCCCTGTCTCCTACCTGAGGCTGAGGAAGGAGAAGGGGATGCACTGTTG FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:148 nM:i:0 A01890:220:H2CNYDSX7:4:1545:9507:5932 0 chr1 13026 255 150M 0 0 GCATGAAGGCTGTCAACCAGTCCATAGGCAAGCCTGGCTGCCTCCAGCTGGGTCGACAGACAGGGGCTGGAGAAGGGGAGAAGAGGAAAGTGAGGTTGCCTGCCCTGTCTCCTACCTGAGGCTGAGGAAGGAGAAGGGGATGCACTGTTG FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:148 nM:i:0
The RG is not in the header of sam file, is it the one behind the tags? And i also rerun filter.py, it does not solve this problem.
thanks.
Hi @zhouyjjj - sorry for the long radio silence, I was on holiday.
Currently Slamdunk really only supports NGM alignments due to the said requirement of certain BAM tags to be present.
The identity value basically just states how much of the original read sequence was aligned in some way or another.
Can you send me the actual command where you call slamdunk all
so I can have a look?
Hi Tobias,
This is how my command looks like:
slamdunk all -r /public/slst/home/lict/ZYJ/referenceseq/GRCh38.p14.genome.fa -b /public/slst/home/lict/ZYJ/simu/genlist.bed -5 8 -n 100 -m -t 12 -o ./ --skip-sam /public/slst/home/lict/ZYJ/result/fastqc/I2_L1_1_val_1.fq /public/slst/home/lict/ZYJ/result/fastqc/I2_L1_2_val_2.fq
The problem is ngm works extremely slow with my actual seq data, which is around 18g per file, and mapping rate is significantly lower than using STAR. So I am still trying to use STAR mapped sam as input. Currently I find replacement for XI ,RG, NM tags, the only problem is MPtag now.
What you could try is to crank the number of threads up even more - maybe to 30 and see if that helps.
To help with the tags, could you instead of supplying the files via commandline, type up a sample sheet in the appropriate formate and run it with this?
Hi Tobias,
I am trying to use STAR mapped file as input of slamdunk, as I have some junction reads that ngm seems unable to handle properly.
And the filter didn't work correctly, the output looks like this:
Creating output directory: ./f10sd Running slamDunk filter for 1 files (12 threads) joblib.externals.loky.process_executor._RemoteTraceback: """ Traceback (most recent call last): File "/public/slst/home/lict/anaconda3/lib/python3.8/site-packages/joblib/externals/loky/process_executor.py", line 431, in _process_worker r = call_item() File "/public/slst/home/lict/anaconda3/lib/python3.8/site-packages/joblib/externals/loky/process_executor.py", line 285, in call return self.fn(*self.args, self.kwargs) File "/public/slst/home/lict/anaconda3/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 595, in call return self.func(*args, *kwargs) File "/public/slst/home/lict/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 252, in call return [func(args, kwargs) File "/public/slst/home/lict/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 252, in
return [func(*args, **kwargs)
File "/public/slst/home/lict/anaconda3/lib/python3.8/site-packages/slamdunk/slamdunk.py", line 170, in runFilter
filter.Filter(bam, outputBAM, getLogFile(outputLOG), bed, mq, minIdentity, maxNM, printOnly, verbose)
File "/public/slst/home/lict/anaconda3/lib/python3.8/site-packages/slamdunk/dunks/filter.py", line 270, in Filter
mappedReads, unmappedReads, filteredReads, mqFiltered, idFiltered, nmFiltered, multimapper = multimapUTRRetainment (infile, outfile, bed, minIdentity, NM, log)
File "/public/slst/home/lict/anaconda3/lib/python3.8/site-packages/slamdunk/dunks/filter.py", line 107, in multimapUTRRetainment
if(float(read.get_tag("XI")) < minIdentity):
File "pysam/libcalignedsegment.pyx", line 2507, in pysam.libcalignedsegment.AlignedSegment.get_tag
File "pysam/libcalignedsegment.pyx", line 2546, in pysam.libcalignedsegment.AlignedSegment.get_tag
KeyError: "tag 'XI' not present"
"""
So I checked the filter.py finding that it asked for "XI" and "NM" tag from sam file which STAR aligner does not output. I think the "NM" tag can be substitute with the "nM" tag in STAR map, but I am not sure what can be used to replace "XI" tag. Also, I am curious why "minIdentity" is set to 0.8 in filter.py. Can you give me some hint on how to get a identity value from STAR aligner.
Thank you!