parklab / xTea

Comprehensive TE insertion identification with WGS/WES data from multiple sequencing technics
Other
87 stars 19 forks source link

Gene list Error #74

Closed ohan-Bioinfo closed 1 year ago

ohan-Bioinfo commented 1 year ago

I assume the script trying to reach the gene list , an error occur Would you please advice if familiar

Traceback (most recent call last):
  File "/cromwell_root/bin/x_TEA_main.py", line 1031, in <module>
    gff.load_gene_annotation_with_extnd(iextnd)
  File "/usr/local/bin/x_gene_annotation.py", line 71, in load_gene_annotation_with_extnd
    ori_start_pos=int(fields[3])
ValueError: invalid literal for int() with base 10: 'pseudogene'
/miniconda/lib/python3.8/site-packages/sklearn/base.py:329: UserWarning: Trying to unpickle estimator LabelEncoder from version 1.0.1 when using version 1.0.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/modules/model_persistence.html#security-maintainability-limitations
  warnings.warn(
Traceback (most recent call last):
  File "/cromwell_root/bin/x_TEA_main.py", line 932, in <module>
    gc.predict_for_site(sf_model, sf_xTEA, sf_new)
  File "/usr/local/bin/x_genotype_classify.py", line 146, in predict_for_site
    with open(sf_xTEA) as fin_xTEA, open(sf_new, "w") as fout_new:
FileNotFoundError: [Errno 2] No such file or directory: '/cromwell_root/WorkingDir/NDX-22-829-004/SVA/candidate_disc_filtered_cns.txt.high_confident.post_filtering_with_gene.txt'
sort: cannot read: /cromwell_root/WorkingDir/NDX-22-829-004/SVA/candidate_disc_filtered_cns.txt.high_confident.post_filtering_with_gene_gntp.txt: No such file or directory
Running command: sort -k1,1V -k2,2n -o /cromwell_root/WorkingDir/NDX-22-829-004/SVA/candidate_disc_filtered_cns.txt.high_confident.post_filtering_with_gene_gntp.txt.sorted /cromwell_root/WorkingDir/NDX-22-829-004/SVA/candidate_disc_filtered_cns.txt.high_confident.post_filtering_with_gene_gntp.txt

Traceback (most recent call last):
  File "/cromwell_root/bin/x_TEA_main.py", line 964, in <module>
    gvcf.cvt_raw_rslt_to_gvcf(s_sample_id, sf_bam, sf_raw_rslt, i_rep_type, sf_ref, sf_vcf)
  File "/usr/local/bin/x_gvcf.py", line 199, in cvt_raw_rslt_to_gvcf
    with open(sf_raw_rslt_sorted) as fin_rslt:
FileNotFoundError: [Errno 2] No such file or directory: '/cromwell_root/WorkingDir/NDX-22-829-004/SVA/candidate_disc_filtered_cns.txt.high_confident.post_filtering_with_gene_gntp.txt.sorted'
Usage: x_TEA_main.py [options]

x_TEA_main.py: error: no such option: --bamsnap
Usage: x_TEA_main.py [options]

x_TEA_main.py: error: no such option: --bamsnap

should i reformat the gff file?

simoncchu commented 1 year ago

Which gene annotation file (gff3 file) do you use as input?

ohan-Bioinfo commented 1 year ago

i obtain the gff file from giving link https://www.gencodegenes.org/human/release_33lift37.html

simoncchu commented 1 year ago

The file should work. It's weird. Not sure exactly which step is wrong.

ohan-Bioinfo commented 1 year ago

early error above the gene error:

Long list of

Error happen at merge clip and disc feature step: chr22 not exist
Error happen at merge clip and disc feature step: chr22 not exist
Error happen at merge clip and disc feature step: chr22 not exist
Error happen at merge clip and disc feature step: chr22 not exist
Error happen at merge clip and disc feature step: chr22 not exist
Error happen at merge clip and disc feature step: chr22 not exist
Error happen at merge clip and disc feature step: chr22 not exist
Error happen at merge clip and disc feature step: chr22 not exist
Error happen at merge clip and disc feature step: chr22 not exist
Error happen at merge clip and disc feature step: chr22 not exist
Error happen at merge clip and disc feature step: chr22 not exist
Error happen at merge clip and disc feature step: chr22 not exist
Error happen at merge clip and disc feature step: chr22 not exist
Error happen at merge clip and disc feature step: chr22 not exist
Error happen at merge clip and disc feature step: chr22 not exist
Error happen at merge clip and disc feature step: chr22 not exist
ohan-Bioinfo commented 1 year ago

Full log:

kindly go through

Error happen at merge clip and disc feature step: chrY not exist
Error happen at merge clip and disc feature step: chrY not exist
Error happen at merge clip and disc feature step: chrY not exist
Error happen at merge clip and disc feature step: chrY not exist
Working on "clip-disc-filtering" step!
Current working folder is: /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/cns/

/root/miniconda/envs/xtea_env/lib/python3.7/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
/root/miniconda/envs/xtea_env/lib/python3.7/site-packages/sklearn/ensemble/gradient_boosting.py:34: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
from ._gradient_boosting import predict_stages
/root/miniconda/envs/xtea_env/lib/python3.7/site-packages/sklearn/ensemble/gradient_boosting.py:34: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
from ._gradient_boosting import predict_stages
Ave coverage is 11.674999999999999: automatic parameters (clip, disc, clip-disc) with value (2, 3 ,0)

Mean insert size is: 178.92676435799711

Standard derivation is: 138.5675854752448

Read length is: 76.0

Maximum insert size is: 594

Average coverage is: 11.674999999999999

Filter (on cns) cutoff: 2 and 3 are used!!!
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 74 sequences (1584 bp)...
[M::mem_process_seqs] Processed 74 reads in 0.006 CPU sec, 0.002 real sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa mem -t 12 -T 9 -k 9 -o /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/cns/temp_clip.sam.non_polyA.sam /cromwell_root/fc-00bcc31d-9daf-40fa-bca6-2c6aa332a71c/rep_lib_annotation/consensus/SVA.fa /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/cns/candidate_sites_all_clip.fq.non_polyA.fq
[main] Real time: 0.007 sec; CPU: 0.009 sec
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 1 sequences (8 bp)...
[M::mem_process_seqs] Processed 1 reads in 0.002 CPU sec, 0.001 real sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa mem -t 12 -T 7 -k 7 -o /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/cns/temp_clip.sam.polyA.sam -c 70 /cromwell_root/fc-00bcc31d-9daf-40fa-bca6-2c6aa332a71c/rep_lib_annotation/consensus/SVA.fa /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/cns/candidate_sites_all_clip.fq.polyA.fq
[main] Real time: 0.005 sec; CPU: 0.004 sec
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 66 sequences (4980 bp)...
[M::mem_process_seqs] Processed 66 reads in 0.024 CPU sec, 0.008 real sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa mem -t 12 -o /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/cns/temp_disc.sam /cromwell_root/fc-00bcc31d-9daf-40fa-bca6-2c6aa332a71c/rep_lib_annotation/consensus/SVA.fa /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/cns/candidate_sites_all_disc.fa
[main] Real time: 0.011 sec; CPU: 0.026 sec
Running command: bwa mem -t 12 -T 9 -k 9 -o /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/cns/temp_clip.sam.non_polyA.sam /cromwell_root/fc-00bcc31d-9daf-40fa-bca6-2c6aa332a71c/rep_lib_annotation/consensus/SVA.fa /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/cns/candidate_sites_all_clip.fq.non_polyA.fq

Running command: bwa mem -t 12 -T 7 -k 7 -o /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/cns/temp_clip.sam.polyA.sam -c 70 /cromwell_root/fc-00bcc31d-9daf-40fa-bca6-2c6aa332a71c/rep_lib_annotation/consensus/SVA.fa /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/cns/candidate_sites_all_clip.fq.polyA.fq

Running command: bwa mem -t 12 -o /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/cns/temp_disc.sam /cromwell_root/fc-00bcc31d-9daf-40fa-bca6-2c6aa332a71c/rep_lib_annotation/consensus/SVA.fa /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/cns/candidate_sites_all_disc.fa

Current working folder is: /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/transduction/

/root/miniconda/envs/xtea_env/lib/python3.7/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
/root/miniconda/envs/xtea_env/lib/python3.7/site-packages/sklearn/ensemble/gradient_boosting.py:34: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
from ._gradient_boosting import predict_stages
/root/miniconda/envs/xtea_env/lib/python3.7/site-packages/sklearn/ensemble/gradient_boosting.py:34: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
from ._gradient_boosting import predict_stages
[bwa_index] Pack FASTA... 0.18 sec
[bwa_index] Construct BWT for the packed sequence...
[bwa_index] 9.78 seconds elapse.
[bwa_index] Update BWT... 0.22 sec
[bwa_index] Pack forward-only FASTA... 0.11 sec
[bwa_index] Construct SA from BWT and Occ... 3.69 sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa index /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/transduction/all_with_polymerphic_flanks.fa
[main] Real time: 14.155 sec; CPU: 13.994 sec
Ave coverage is 10.815: automatic parameters (clip, disc, clip-disc) with value (2, 3 ,0)

Mean insert size is: 176.80330571865198

Standard derivation is: 117.27342863836222

[re-select step]:Filtered out: chr1:90892982 fall in repetitive region.
[re-select step]:Filtered out: chr12:7198975 fall in repetitive region.
Running command with output: cat /cromwell_root/fc-00bcc31d-9daf-40fa-bca6-2c6aa332a71c/rep_lib_annotation/SVA/hg38/hg38_FL_SVA_flanks_3k.fa /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/transduction/all_with_polymerphic_flanks.fa.polymerphic_only.fa

Running command: bwa index /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/transduction/all_with_polymerphic_flanks.fa

[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 37845 sequences (786877 bp)...
[M::mem_process_seqs] Processed 37845 reads in 21.074 CPU sec, 5.652 real sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa mem -t 12 -T 9 -k 9 -o /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/transduction/temp_transduction_clip.sam.non_polyA.sam /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/transduction/all_with_polymerphic_flanks.fa /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/transduction/raw_candidate_sites_all_clip.fq.non_polyA.fq
[main] Real time: 5.807 sec; CPU: 21.171 sec
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 131 sequences (966 bp)...
[M::mem_process_seqs] Processed 131 reads in 0.051 CPU sec, 0.019 real sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa mem -t 12 -T 7 -k 7 -o /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/transduction/temp_transduction_clip.sam.polyA.sam -c 70 /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/transduction/all_with_polymerphic_flanks.fa /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/transduction/raw_candidate_sites_all_clip.fq.polyA.fq
[main] Real time: 0.049 sec; CPU: 0.078 sec
Running command: bwa mem -t 12 -T 9 -k 9 -o /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/transduction/temp_transduction_clip.sam.non_polyA.sam /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/transduction/all_with_polymerphic_flanks.fa /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/transduction/raw_candidate_sites_all_clip.fq.non_polyA.fq

Running command: bwa mem -t 12 -T 7 -k 7 -o /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/transduction/temp_transduction_clip.sam.polyA.sam -c 70 /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/transduction/all_with_polymerphic_flanks.fa /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/transduction/raw_candidate_sites_all_clip.fq.polyA.fq

[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 15615 sequences (1179886 bp)...
[M::mem_process_seqs] Processed 15615 reads in 10.106 CPU sec, 2.531 real sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa mem -t 12 -o /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/transduction/temp_transduction_disc.sam /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/transduction/all_with_polymerphic_flanks.fa /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/transduction/raw_candidate_sites_all_disc.fa
[main] Real time: 2.603 sec; CPU: 10.153 sec
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 8 sequences (606 bp)...
[M::mem_process_seqs] Processed 8 reads in 0.002 CPU sec, 0.001 real sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa mem -t 12 -o /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/transduction/temp_transduction_disc_cns.sam /cromwell_root/fc-00bcc31d-9daf-40fa-bca6-2c6aa332a71c/rep_lib_annotation/consensus/SVA.fa /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/transduction/raw_candidate_sites_all_disc_focal_sites.fa
[main] Real time: 0.005 sec; CPU: 0.005 sec
Running command: bwa mem -t 12 -o /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/transduction/temp_transduction_disc.sam /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/transduction/all_with_polymerphic_flanks.fa /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/transduction/raw_candidate_sites_all_disc.fa

[DISC-TD-STEP:] Filter out chr9:72805495, aligned to itself flanking regions
[DISC-TD-STEP:] Filter out chr9:72805495, aligned to itself flanking regions
[DISC-TD-STEP:] Filter out chr1:151770106, aligned to itself flanking regions
Running command: bwa mem -t 12 -o /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/transduction/temp_transduction_disc_cns.sam /cromwell_root/fc-00bcc31d-9daf-40fa-bca6-2c6aa332a71c/rep_lib_annotation/consensus/SVA.fa /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/transduction/raw_candidate_sites_all_disc_focal_sites.fa

Current working folder is: /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/tmp/transduction/

Ave coverage is 10.815: automatic parameters (clip, disc, clip-disc) with value (2, 3 ,0)

Mean insert size is: 176.80330571865198

Standard derivation is: 117.27342863836222

/root/miniconda/envs/xtea_env/lib/python3.7/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
/root/miniconda/envs/xtea_env/lib/python3.7/site-packages/sklearn/ensemble/gradient_boosting.py:34: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
from ._gradient_boosting import predict_stages
/root/miniconda/envs/xtea_env/lib/python3.7/site-packages/sklearn/ensemble/gradient_boosting.py:34: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
from ._gradient_boosting import predict_stages
/root/miniconda/envs/xtea_env/lib/python3.7/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
/root/miniconda/envs/xtea_env/lib/python3.7/site-packages/sklearn/ensemble/gradient_boosting.py:34: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
from ._gradient_boosting import predict_stages
/root/miniconda/envs/xtea_env/lib/python3.7/site-packages/sklearn/ensemble/gradient_boosting.py:34: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
from ._gradient_boosting import predict_stages
/root/miniconda/envs/xtea_env/lib/python3.7/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
/root/miniconda/envs/xtea_env/lib/python3.7/site-packages/sklearn/ensemble/gradient_boosting.py:34: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
from ._gradient_boosting import predict_stages
/root/miniconda/envs/xtea_env/lib/python3.7/site-packages/sklearn/ensemble/gradient_boosting.py:34: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
from ._gradient_boosting import predict_stages
/root/miniconda/envs/xtea_env/lib/python3.7/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
/root/miniconda/envs/xtea_env/lib/python3.7/site-packages/sklearn/ensemble/gradient_boosting.py:34: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
from ._gradient_boosting import predict_stages
/root/miniconda/envs/xtea_env/lib/python3.7/site-packages/sklearn/ensemble/gradient_boosting.py:34: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
from ._gradient_boosting import predict_stages
/root/miniconda/envs/xtea_env/lib/python3.7/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
/root/miniconda/envs/xtea_env/lib/python3.7/site-packages/sklearn/ensemble/gradient_boosting.py:34: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
from ._gradient_boosting import predict_stages
/root/miniconda/envs/xtea_env/lib/python3.7/site-packages/sklearn/ensemble/gradient_boosting.py:34: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
from ._gradient_boosting import predict_stages
/root/miniconda/envs/xtea_env/lib/python3.7/site-packages/sklearn/base.py:251: UserWarning: Trying to unpickle estimator DecisionTreeClassifier from version 0.18.1 when using version 0.20.0. This might lead to breaking code or invalid results. Use at your own risk.
UserWarning)
/root/miniconda/envs/xtea_env/lib/python3.7/site-packages/sklearn/base.py:251: UserWarning: Trying to unpickle estimator RandomForestClassifier from version 0.18.1 when using version 0.20.0. This might lead to breaking code or invalid results. Use at your own risk.
UserWarning)
Running command: sort -k1,1V -k2,2n -o /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/candidate_disc_filtered_cns.txt.high_confident.post_filtering_with_gene_gntp.txt.sorted /cromwell_root/workingdir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/SVA/candidate_disc_filtered_cns.txt.high_confident.post_filtering_with_gene_gntp.txt

/root/miniconda/envs/xtea_env/lib/python3.7/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
/root/miniconda/envs/xtea_env/lib/python3.7/site-packages/sklearn/ensemble/gradient_boosting.py:34: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
from ._gradient_boosting import predict_stages
/root/miniconda/envs/xtea_env/lib/python3.7/site-packages/sklearn/ensemble/gradient_boosting.py:34: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
from ._gradient_boosting import predict_stages
simoncchu commented 1 year ago

Could you post the top lines of your gff3 file? The error message seems clear "ValueError: invalid literal for int() with base 10: 'pseudogene'", the 3rd column is not a number but "pseudogene".

ohan-Bioinfo commented 1 year ago
##gff-version 3
#!gff-spec-version 1.21
#!processor NCBI annotwriter
#!genome-build GRCh38.p14
#!genome-build-accession NCBI_Assembly:GCF_000001405.40
#!annotation-source NCBI Homo sapiens Annotation Release 110
##sequence-region NC_000001.11 1 248956422
##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9606
NC_000001.11    RefSeq  region  1       248956422       .       +       .       ID=NC_000001.11:1..248956422;Dbxref=taxon:9606;Name=1;chromosome=1;gbkey=Src;genome=chromosome;mol_type=genomic DNA
NC_000001.11    BestRefSeq      pseudogene      11874   14409   .       +       .       ID=gene-DDX11L1;Dbxref=GeneID:100287102,HGNC:HGNC:37102;Name=DDX11L1;description=DEAD/H-box helicase 11 like 1 (pseudogene);gbkey=Gene;gene=DDX11L1;gene_biotype=transcribed_pseudogene;pseudo=true
simoncchu commented 1 year ago

Could you try with https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_33/gencode.v33.annotation.gff3.gz ? (decompress)

ohan-Bioinfo commented 1 year ago

after using above gff:

what this means [DISC-TD-STEP:] Filter out chr8:42182019, no enough disc support!


[DISC-TD-STEP:] Filter out chr8:42182019, no enough disc support!
[DISC-TD-STEP:] Filter out chr1:108807576, no enough disc support!
[DISC-TD-STEP:] Filter out chr15:19988084, no enough disc support!
[DISC-TD-STEP:] Filter out chr15:19988312, no enough disc support!
Running command: bwa mem -t 8 -o /cromwell_root/WorkingDir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/L1/tmp/transduction/temp_transduction_disc_cns.sam /cromwell_root/fc-00bcc31d-9daf-40fa-bca6-2c6aa332a71c/rep_lib_annotation/consensus/LINE1.fa /cromwell_root/WorkingDir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/L1/tmp/transduction/raw_candidate_sites_all_disc_focal_sites.fa

Current working folder is: /cromwell_root/WorkingDir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/L1/tmp/transduction/

Ave coverage is 10.552000000000001: automatic parameters (clip, disc, clip-disc) with value (2, 3 ,0)

Mean insert size is: 177.52539486568182

Standard derivation is: 118.77350691544784

[re-select step]:Filtered out: chr1:165752252 fall in repetitive region.
[re-select step]:Filtered out: chr1:178747171 fall in repetitive region.
[re-select step]:Filtered out: chr1:178747358 fall in repetitive region.
[re-select step]:Filtered out: chr1:178747601 fall in repetitive region.
[re-select step]:Filtered out: chr2:12023938 fall in repetitive region.
[re-select step]:Filtered out: chr2:131480777 fall in repetitive region.
[re-select step]:Filtered out: chr2:148947924 fall in repetitive region.
[re-select step]:Filtered out: chr2:167714308 fall in repetitive region.
[re-select step]:Filtered out: chr2:167715162 fall in repetitive region.
[re-select step]:Filtered out: chr2:167715531 fall in repetitive region.
[re-select step]:Filtered out: chr3:39411118 fall in repetitive region.
[re-select step]:Filtered out: chr5:80351177 fall in repetitive region.
[re-select step]:Filtered out: chr5:80351398 fall in repetitive region.
[re-select step]:Filtered out: chr5:80351691 fall in repetitive region.
[re-select step]:Filtered out: chr5:100391625 fall in repetitive region.
[re-select step]:Filtered out: chr6:26897533 fall in repetitive region.
[re-select step]:Filtered out: chr6:26898384 fall in repetitive region.
[re-select step]:Filtered out: chr6:26899051 fall in repetitive region.
[re-select step]:Filtered out: chr6:32326239 fall in repetitive region.
[re-select step]:Filtered out: chr6:33451836 fall in repetitive region.
[re-select step]:Filtered out: chr8:12665235 fall in repetitive region.
[re-select step]:Filtered out: chr8:12665350 fall in repetitive region.
[re-select step]:Filtered out: chr8:100706886 fall in repetitive region.
[re-select step]:Filtered out: chr8:100706997 fall in repetitive region.
[re-select step]:Filtered out: chr9:82058876 fall in repetitive region.
[re-select step]:Filtered out: chr9:82059226 fall in repetitive region.
[re-select step]:Filtered out: chr9:82059833 fall in repetitive region.
[re-select step]:Filtered out: chr10:77982066 fall in repetitive region.
[re-select step]:Filtered out: chr11:6108638 fall in repetitive region.
[re-select step]:Filtered out: chr11:89047236 fall in repetitive region.
[re-select step]:Filtered out: chr11:89047796 fall in repetitive region.
[re-select step]:Filtered out: chr11:89753656 fall in repetitive region.
[re-select step]:Filtered out: chr11:108229378 fall in repetitive region.
[re-select step]:Filtered out: chr14:24183972 fall in repetitive region.
[re-select step]:Filtered out: chr15:92795885 fall in repetitive region.
[re-select step]:Filtered out: chr17:45436837 fall in repetitive region.
[re-select step]:Filtered out: chr17:45437350 fall in repetitive region.
[re-select step]:Filtered out: chr17:45437673 fall in repetitive region.
[re-select step]:Filtered out: chr17:48875109 fall in repetitive region.
[re-select step]:Filtered out: chr17:59056942 fall in repetitive region.
[re-select step]:Filtered out: chr17:59057075 fall in repetitive region.
[re-select step]:Filtered out: chrX:105406697 fall in repetitive region.
Blacklist file null does not exist!
/miniconda/lib/python3.8/site-packages/sklearn/base.py:329: UserWarning: Trying to unpickle estimator LabelEncoder from version 1.0.1 when using version 1.0.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/modules/model_persistence.html#security-maintainability-limitations
warnings.warn(
[2023-01-23 20:52:45.644] Start to evalute the model:
[2023-01-23 20:52:45.644] Evaluating cascade layer = 0
Running command: sort -k1,1V -k2,2n -o /cromwell_root/WorkingDir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/L1/candidate_disc_filtered_cns.txt.high_confident.post_filtering_with_gene_gntp.txt.sorted /cromwell_root/WorkingDir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/L1/candidate_disc_filtered_cns.txt.high_confident.post_filtering_with_gene_gntp.txt
ohan-Bioinfo commented 1 year ago

the pipeline stop here without an error:

[re-select step]:Filtered out: chr17:59056942 fall in repetitive region.
[re-select step]:Filtered out: chr17:59057075 fall in repetitive region.
[re-select step]:Filtered out: chrX:105406697 fall in repetitive region.
Blacklist file null does not exist!
[2023-01-23 21:25:49.264] Start to evalute the model:
[2023-01-23 21:25:49.264] Evaluating cascade layer = 0 
Running command: sort -k1,1V -k2,2n -o /cromwell_root/WorkingDir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/L1/candidate_disc_filtered_cns.txt.high_confident.post_filtering_with_gene_gntp.txt.sorted /cromwell_root/WorkingDir/01-MD-20-02-001.hg38.aligned.duplicate_marked.sorted/L1/candidate_disc_filtered_cns.txt.high_confident.post_filtering_with_gene_gntp.txt