parklab / xTea

Comprehensive TE insertion identification with WGS/WES data from multiple sequencing technics
Other
87 stars 19 forks source link

Sporadic inconsistent errors with xtea_long #93

Open yuliamostovoy opened 8 months ago

yuliamostovoy commented 8 months ago

Hi,

I'm trying to run xTea_long on the cloud (Terra) and I'm seeing weird issues where I run it on exactly the same data with the same command and configuration and sometimes it fails and other times it completes successfully. The failures seem to be related to generating the "all_ins_seqs.fa" file, although I'm finding it hard to tell exactly what's going wrong from the logs, and I'm confused as to why it's inconsistent. I tried adding a lot of RAM in case that was the issue, but the errors are still happening. I'm attaching the logs from a few failed runs and two successes on the same data. I'd like to run xTea_long on a number of samples, so I'd really appreciate any help in resolving this. Thanks! xtea_logs.tar.gz

simoncchu commented 8 months ago

Inside the failed ones, there are earlier errors like:

multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/opt/conda/envs/xtea/lib/python3.9/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, kwds)) File "/opt/conda/envs/xtea/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar return list(map(args)) File "/xTea/xtea_long/x_reference.py", line 50, in unwrap_gnrt_flank_regions return XReference.run_gnrt_flank_region_for_chrm(arg, kwarg) File "/xTea/xtea_long/x_reference.py", line 109, in run_gnrt_flank_region_for_chrm s_left_region = f_fa.fetch(ref_chrm, istart, pos) File "pysam/libcfaidx.pyx", line 301, in pysam.libcfaidx.FastaFile.fetch KeyError: "sequence '2' not present" """

The code is trying to extract the flanking sequences of a region on chromosome "2" and failed, but before that there is code checking the reference one is "chr2" or "2", so I am not sure why there is an error reported there.

Is it possible to share with me the workspace for a testing?

yuliamostovoy commented 8 months ago

Thanks, yes, I can share the workspace (testing with public data). Should I use your listed gmail account?

yuliamostovoy commented 7 months ago

@simoncchu I shared a couple of workspaces - I think the second has data that you should be able to see directly without any further permission-wrangling. Let me know if there are any issues, and thanks!

simoncchu commented 7 months ago

I found the shared workspaces. I checked the failed jobs, and still cannot find out why the errors were triggered. I will download one of the smaller bam and test locally and get back to you.

yuliamostovoy commented 7 months ago

Thank you, I really appreciate it!