wjian8 / psvcp_v1.01

Pan-genome Construction and Population Structure Variation Calling pipeline
GNU General Public License v3.0
33 stars 5 forks source link

Errors occurred while running example data #5

Open tanwei123456 opened 1 year ago

tanwei123456 commented 1 year ago

Dear author, Hello,when I run python3 ../Construct_pan_and_Call_sv.py genome_gff_dir_example genome_list -fqd fq_dir_example -o step1, I am experiencing the following error:

`Input delta file: pan_dir_result/ref2_R498_0-2M/ref2_R498_0-2M.delta
Output prefix: pan_dir_result/ref2_R498_0-2M/ref2_R498_0-2M.bed
Unique anchor length: 1000
Minimum variant size to call: 50
Maximum variant size to call: 10000000
Logging progress updates in pan_dir_result/ref2_R498_0-2M/progress.log
1. Filter delta file
2. Finding variants between alignments
Loaded 141 alignments
3. Finding variants within alignments
4. Combine variants between and within alignments
Warning messages:
1: Removed 12 rows containing missing values (`geom_bar()`). 
2: Removed 10 rows containing missing values (`geom_bar()`). 
3: The dot-dot notation (`..count..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(count)` instead. 
4: Removed 12 rows containing missing values (`geom_bar()`). 
5: Removed 12 rows containing missing values (`geom_bar()`). 
6: Removed 10 rows containing missing values (`geom_bar()`). 
7: Removed 12 rows containing missing values (`geom_bar()`). 
Warning messages:
1: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead. 
2: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
ℹ Please use the `linewidth` argument instead. 
Warning message:
Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead. 
index file pan_dir_result/ref2.fa.fai not found, generating...
Traceback (most recent call last):
  File "/gss1/home/gaozhh01/biosoft/psvcp_v1.01/example7/../construct_pan_script/11gene_in_pv_screen.py", line 7, in <module>
    df = pd.read_csv(sys.argv[1],sep='\t',header=None)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/gss1/home/gaozhh01/miniconda3/envs/pan/lib/python3.11/site-packages/pandas/util/_decorators.py", line 211, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/gss1/home/gaozhh01/miniconda3/envs/pan/lib/python3.11/site-packages/pandas/util/_decorators.py", line 331, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/gss1/home/gaozhh01/miniconda3/envs/pan/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 950, in read_csv
    return _read(filepath_or_buffer, kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/gss1/home/gaozhh01/miniconda3/envs/pan/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 605, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/gss1/home/gaozhh01/miniconda3/envs/pan/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1442, in __init__
    self._engine = self._make_engine(f, self.engine)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/gss1/home/gaozhh01/miniconda3/envs/pan/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1735, in _make_engine
    self.handles = get_handle(
                   ^^^^^^^^^^^
  File "/gss1/home/gaozhh01/miniconda3/envs/pan/lib/python3.11/site-packages/pandas/io/common.py", line 856, in get_handle
    handle = open(
             ^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'pan_dir_result/R498_0-2M.gff.in_pv'
Error in file(file, "rt") : cannot open the connection
Calls: read.table -> file
In addition: Warning message:
In file(file, "rt") :
  cannot open file 'pan_dir_result/R498_0-2M.gff_gene_absolutly.in_pv': No such file or directory
Execution halted
cat: pan_dir_result/R498_0-2M.gene_absolutly_in_pv.gff: No such file or directory`

This is the resulting file. image

R version:R-4.0.3 and Python version: python3.9.4.env I would appreciate it if you could answer my questions! tanwei 2023.03.15

wjian8 commented 1 year ago

You have got the pan.fa and pan.gff

There is not 'pan_dir_result/R498_0-2M.gff.in_pv' because the pipeline can't find any gene (or CDS, exon) on the PAV which would be inserted into ref2.genome. If the pipeline can find genes on the PAV which would be inserted into ref.genome, it will generate gene_absolutly_in_pv.gff file. For example, in the example case, CN1_0-2M.gene_absolutly_in_pv.gff cantains the gene on the PAV which would be inserted in the ref1.genome. The ref1.update2.gff will be updated by adding itself and the CN1_0-2M.gene_absolutly_in_pv.gff information. The adding result is ref2.gff If there is no gene_absolutly_in_pv.gff, The ref2.update2.gff is the same as ref3.gff. It is no problem with the pangenome construction