Closed ohan-Bioinfo closed 3 months ago
The run will finish here
/python3.6/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
Looks like some error happened at the genotyping step. But if you don't need that information, you can use candidate_disc_filtered_cns.txt.high_confident.post_filtering.txt
Mentioned file candidate_disc_filtered_cns.txt.high_confident.post_filtering.txt
is Empty.
well is this has anything to do with the error you spoke about?
Error happen at merge clip and disc feature step: chrY not exist
Error happen at merge clip and disc feature step: chrY not exist
Error happen at merge clip and disc feature step: chrY not exist
Error happen at merge clip and disc feature step: chrY not exist
Error happen at merge clip and disc feature step: chrY not exist
Error happen at merge clip and disc feature step: chrY not exist
Error happen at merge clip and disc feature step: chrY not exist
Error happen at merge clip and disc feature step: chrY not exist
Error happen at merge clip and disc feature step: chrY not exist
/root/miniconda/envs/xtea_env/lib/python3.7/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
/root/miniconda/envs/xtea_env/lib/python3.7/site-packages/sklearn/ensemble/gradient_boosting.py:34: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
from ._gradient_boosting import predict_stages
and one more
[DISC-TD-STEP:] Filter out chr8:42182019, no enough disc support!
what are these any explanation?:
cat candidate_disc_filtered_cns.txt.high_confident.post_filtering.txt
chr11 7695684 7695672 7695684 12 1 8 6 8 1 0 47.95 29.645 1 8 6 8 10 313:313 10:30 50.0:181.0 38.0:265.0 not_transduction 0 0 0 6 0 8 0 Not-5prime-inversion two_side_tprt_both both_end_consistent hit_end_of_consensus 14 0 5 18 27 104 0 4:6:9:9:9:10:10:23:23:28:28:28:30:30 0 6 8 0 303 not_in_Alu_copy
chr13 21320625 21320625 21320641 16 1 1 3 4 0 1 10.945 22.75 1 13 4 0 -1 68:68 281:281 288.5:289.0 84.0:242.0 not_transduction 0 0 0 0 30 4 Not-5prime-inversion two_side_tprt_both both_end_consistent hit_end_of_consensus 2 1 4 110 4 0 4:29 0 3 4 0 213 not_in_Alu_copy
chr19 4097228 4097255 4097228 27 1 12 3 2 0 12 10.345 32.445 1 12 3 2 012 315:315 304:316 468.0:478.0 473.0:478.0 not_transduction 0 0 0 3 0 2 0 Not-5prime-inversion two_side_tprt_both both_end_consistent hit_end_of_consensus 13 0 4 17 9 34 0 22:22:22:22:22:22:27:27:32:32:32:32:32 3 0 2 0 1 not_in_Alu_copy
chr19 52384817 -1 52384817 -1 0 2 3 3 0 0 24.565 26.28 0 2 33 0 0 -1:-1 53:68 75.0:123.5 202.5:307.0 not_transduction 0 0 0 3 0 30 Not-5prime-inversion one_half_side one_end_consistent hit_end_of_consensus 9 7 11 6 8 80 56:27:17:17:39:44:23:37:27 1 2 3 0 232 not_in_Alu_copy
i cant find the documentation to know what all these numbers are referred to?
also looking forward for the explanation of the columns in this output.
These are intermediate files. You can find the detailed meaning of each filed in the final VCF file.
Thank you Simon for the reply. I am having problems running the pipeline all the way through (it gets stuck at the genotyping step, although DeepForest is installed. What parameters should I use to have a vcf file as an output without going through genotyping?
Could you use the github version, rather than the bioconda version and try? It should work well with DeepForest module. Many users have successfully run the whole pipeline.
Github version took me a little further, now I can get a vcf file although the algorithm did not complete.
UserWarning: Trying to unpickle estimator LabelEncoder from version 1.0.1 when using version 0.24.2. This might lead to breaking code or invalid results. Use at your own risk.
UserWarning)
[2023-03-27 22:29:47.683] Start to evalute the model:
[2023-03-27 22:29:47.730] Evaluating cascade layer = 0
I think it is due to the version of sklearn that python in my system is using (?). But my bigger concern is running xTea on another sample that got me following error:
Discordant cutoff: 4 is used!!!
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/.conda/envs/xt/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/.conda/envs/xt/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/xTea/xtea/x_TEI_locator.py", line 564, in unwrap_self_filter_by_discordant_non_barcode
return TELocator.run_filter_by_discordant_pair_by_chrom_non_barcode(*arg, **kwarg)
File "/xTea/xtea/x_TEI_locator.py", line 1111, in run_filter_by_discordant_pair_by_chrom_non_barcode
site_pos + iextend, i_is, f_dev, xannotation)
File "/xTea/xtea/x_alignments.py", line 99, in cnt_discordant_pairs
iter_alignmts = bamfile.fetch(chrm, start, end)
File "pysam/libcalignmentfile.pyx", line 1081, in pysam.libcalignmentfile.AlignmentFile.fetch
File "pysam/libchtslib.pyx", line 689, in pysam.libchtslib.HTSFile.parse_region
ValueError: invalid coordinates: start (10305) > stop (10304)
"""
Please let me know how can I debug this. Thanks, Barun
Sorry for the late follow-up. You can try install following the docker file here: https://github.com/parklab/xTea/blob/master/Dockerfile
Please confirm whether this is the final output because I was expecting a gvcf file once the run completed with these results in L1 file.
candidate_disc_filtered_cns.txt candidate_disc_filtered_cns.txt.before_calling_transduction candidate_disc_filtered_cns.txt.before_calling_transduction.sites_cov candidate_disc_filtered_cns.txt.before_filtering candidate_disc_filtered_cns.txt.gntp.features candidate_disc_filtered_cns.txt.gntp.features0.out candidate_disc_filtered_cns.txt.high_confident candidate_disc_filtered_cns.txt.high_confident.post_filtering.txt candidate_disc_filtered_cns.txt.high_confident.post_filtering.txt.post_filtering.log candidate_disc_filtered_cns2.txt candidate_disc_filtered_cns2.txt.all_non_sibling_td.txt candidate_disc_filtered_cns2.txt.all_non_sibling_td.txt.new_sites candidate_disc_filtered_cns2.txt.all_non_sibling_td.txt.tmp_new_sites_position_only candidate_disc_filtered_cns2.txt.all_non_sibling_td.txt.tmp_new_sites_position_only.gntp.features candidate_disc_filtered_cns2.txt.all_non_sibling_td.txt.tmp_new_sites_position_only.gntp.features0.out candidate_disc_filtered_cns2.txt.all_non_sibling_td.txt.unique_trsdct_disc_only candidate_disc_filtered_cns2.txt.all_non_sibling_td.txt.unique_trsdct_disc_only_half_clip_half_disc candidate_disc_filtered_cns2.txt.all_non_sibling_td.txt.unique_trsdct_disc_only_half_clip_half_disc_polyA candidate_disc_filtered_cns2.txt.all_non_sibling_td.txt.unique_trsdct_disc_only_half_clip_half_disc_polyA_after_filter candidate_disc_filtered_cns2.txt.all_non_sibling_td.txt.unique_trsdct_disc_orphan candidate_disc_filtered_cns2.txt.all_non_sibling_td.txt.unique_trsdct_half_clip candidate_disc_filtered_cns2.txt.high_confident candidate_disc_filtered_cns2.txt.sibling_transduction_from_existing_list candidate_disc_filtered_cns_post_filtering.txt candidate_disc_filtered_cns_post_filtering.txt.post_filtering.log candidate_list_from_clip.txt candidate_list_from_clip.txt_tmp candidate_list_from_disc.txt candidate_list_from_disc.txt.clip_sites_raw_disc.txt candidate_list_from_disc.txt.clip_sites_raw_disc.txt.slct