Closed bredward closed 2 years ago
Hi, Thank you for your question. I think your issue was raised by the wrong module 'intervals'. There are a lot of python packages named intervals in the environment. I suggest you first uninstall any module named intervals (such as intervals, pyinterval) or create a new conda environment. Please intall python-intervals with 'pip install python-intervals'. You can type "import intervals as I; I.empty()" to test it. Best, Zelin
Hi again,
Thank you for responding so quickly! I created a new conda environment without the 'python-intervals' package, as you suggested and used pip to install python-intervals instead and the issue was resolved! However, now I am finding a new error that, if I understand correctly, seems to be related to the path(s) I am using for my output directory when running in cRG mode.
Currently I have a series of files that I have ran through RG and DNSC but now when I try to feed those into cRG, I am getting this error:
Check DNSC directory
Make query sequence: createFastq
Check fastq file
Check anno file
Check genome file
Align fastq to reference genome: alignFastq
Transform SAM to BAM: sam2bam
Analyze SAM file: explainFL
Filter and classify candidates: filterFL
Adjust normal: adjExplainNormal
Traceback (most recent call last):
File "/home/bredward/miniconda3/envs/circfull.new/bin/circfull", line 8, in
It seems to be looking for the 'explainFL_Normal.txt' file in the RG folder, which I do have in my RG run output directory, but the path is not pointing to this folder -- it is pointing to the cRG directory. So far I have stored the output of both RG and DNSC in the same directory, and from what I understood, I would use this same directory for the cRG input & output...is that correct? For reference, here is the line I am running:
circfull cRG -t $thread -g ../ref_genome_files/Lotusjaponicus_Gifu_v1.2_genome.fa -a ../ref_genome_files/sort.gtf.gz -f split.1_2 -o split.1_2
And here is the structure of my output directory after running cRG and getting this error:
split.1_2 ├── DNSC │ ├── TideHunter.tab │ ├── TideHunter_Pass.tab │ ├── novoCluster.txt │ ├── novoseq.fa │ ├── raw2raw.paf │ ├── raw2raw.sort.paf │ ├── rawseq.fa │ ├── test_1.fa │ └── tmp ├── RG │ ├── BS_Normal.txt │ ├── BS_Normal_adj.txt │ ├── ExonEdict.npy │ ├── ExonEdict_fsj.npy │ ├── ExonSdict.npy │ ├── ExonSdict_fsj.npy │ ├── circFL_Normal.bed │ ├── circFL_Normal.txt │ ├── circSeq.fa │ ├── circSeq.th │ ├── constructFL_Normal.txt │ ├── constructFL_Normal_adj.txt │ ├── explainFL.txt │ ├── explainFL_ID2Type.txt │ ├── explainFL_Normal.txt │ ├── explainFL_Normal_adj.txt │ ├── explainFL_noprimary.txt │ ├── fusion │ │ └── tmp │ ├── result_Normal.txt │ ├── strandDict.npy │ ├── strandDict_fsj.npy │ ├── test.minimap2.sam │ └── tmp └── cRG ├── RG │ ├── explainFL.txt │ ├── explainFL_ID2Type.txt │ ├── explainFL_noprimary.txt │ ├── fusion │ │ └── tmp │ ├── test.minimap2.sam │ └── tmp └── pseudo.fq
Any help/thoughts on this would be GREATLY appreciated!
Hi! I have some thoughts but not sure. I suggest you type this command
cut -f 1 split.1_2/cRG/RG/explainFL_ID2Type.txt |sort |uniq -c
to check whether there are some normal reads to be selected. If you do not see any 'N' type, means the cRG don't get any results. Normally, this error wouldn't happend except you dataset is very small. Please let me know your findings.
Hi there,
I keep having an issue with circfull RG not running to completion. It seems to be related to multiprocessing however the error still occurs when I reduce the # of threads. I've done a little research to see if it's something I can resolve myself but I have not found a reasonable explanation for what could be causing the problem. If you have any thoughts on how to troubleshoot this (see log pasted below) or if this looks at all familiar, your advice would be greatly appreciated!
For reference, here is the log during the running process and the specific error that I'm getting at the bottom. I should also note that I've tried setting the # threads lower (< the default) and that doesn't seem to make a difference. I've also tried splitting the fastq input into smaller files and it doesn't seem to help either - I see the error message regardless.
2022-06-01 11:33:36
Check fastq file
2022-06-01 11:33:36
Check anno file
2022-06-01 11:33:36
Check genome file
2022-06-01 11:33:36
Align fastq to reference genome: alignFastq
2022-06-01 12:22:14
Transform SAM to BAM: sam2bam
2022-06-01 12:22:17
Analyze SAM file: explainFL
2022-06-01 12:22:36
Filter and classify candidates: filterFL
2022-06-01 12:22:43 #### |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| | ETA: 0:00:00
Adjust normal: adjExplainNormal multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/bredward/miniconda3/envs/circfull/lib/python3.9/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, *kwds)) File "/home/bredward/miniconda3/envs/circfull/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar return list(map(args)) File "/home/bredward/miniconda3/envs/circfull/lib/python3.9/site-packages/circfull-0.0.8-py3.9.egg/circfull/RG_adjExplainNormal.py", line 30, in getNewFL exon1=createIntervals(each1['exon_start'],each1['exon_end']) File "/home/bredward/miniconda3/envs/circfull/lib/python3.9/site-packages/circfull-0.0.8-py3.9.egg/circfull/RG_adjExplainNormal.py", line 7, in createIntervals x=I.empty() AttributeError: module 'intervals' has no attribute 'empty' """
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/home/bredward/miniconda3/envs/circfull/bin/circfull", line 33, in
sys.exit(load_entry_point('circfull==0.0.8', 'console_scripts', 'circfull')())
File "/home/bredward/miniconda3/envs/circfull/lib/python3.9/site-packages/circfull-0.0.8-py3.9.egg/circfull/circFL_main.py", line 37, in main
File "/home/bredward/miniconda3/envs/circfull/lib/python3.9/site-packages/circfull-0.0.8-py3.9.egg/circfull/RG.py", line 87, in RG
File "/home/bredward/miniconda3/envs/circfull/lib/python3.9/site-packages/circfull-0.0.8-py3.9.egg/circfull/RG_adjExplainNormal.py", line 60, in adjExplainNormal
File "/home/bredward/miniconda3/envs/circfull/lib/python3.9/multiprocessing/pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/bredward/miniconda3/envs/circfull/lib/python3.9/multiprocessing/pool.py", line 771, in get
raise self._value
AttributeError: module 'intervals' has no attribute 'empty'
Again, I appreciate your help with this. I am looking forward to utilizing the full pipeline as soon as I am able to resolve this!
-Bri Edwards