vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.11k stars 194 forks source link

Get input file error #4382

Open OZTaekOppa opened 2 months ago

OZTaekOppa commented 2 months ago

1. What were you trying to do?

Dear vg team,

Thank you for the great program.

FYI, minigraph v0.21 (https://github.com/lh3/minigraph) vg v1.56.0 (https://github.com/vgteam/vg) vcfbub (https://github.com/pangenome/vcfbub) Input file: Followed PanSN-spec: Pangenome Sequencing Naming (https://github.com/pangenome/PanSN-spec)

I encountered an issue while testing the minigraph GFA file using HPRC in the vg deconstruct step.

2. What did you want to happen? Ensure LV annotations using vg deconstruct

3. What actually happened?

4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here:

Executed command:
## Step 6: Ensure LV annotations using vg deconstruct
vg snarls ${OUTPUT_DIR}/chm13_t2tctg_mgout.gfa > ${OUTPUT_DIR}/chm13_t2tctg_mgout.snarls
vg deconstruct -e -a ‘#’ -P chm13 --snarls ${OUTPUT_DIR}/chm13_t2tctg_mgout.snarls ${OUTPUT_DIR}/chm13_t2tctg_mgout.gfa > ${OUTPUT_DIR}/chm13_t2tctg_mgout.sv.lv.vcf

## Step 7: Convert bgzip of vcf
bgzip -c ${OUTPUT_DIR}/chm13_t2tctg_mgout.sv.lv.vcf > ${OUTPUT_DIR}/chm13_t2tctg_mgout.sv.lv.vcf.gz
tabix -p vcf  ${OUTPUT_DIR}/chm13_t2tctg_mgout.sv.lv.vcf.gz

## Step 8: Remove large (> 10Mb) spurious DELsin MC & PGGB graphs
singularity exec /singularityimg/pggb_latest.sif vcfbub -l 0 -r 10000000 -i ${OUTPUT_DIR}/chm13_t2tctg_mgout.sv.lv.vcf.gz > ${OUTPUT_DIR}/chm13_t2tctg_mgout.sv.lv.filterd.vcf.gz

In Step 6, although the -H option has been deprecated and that was fine, the issue lies with the # symbol. I tried using variations such as '\#', "#", and "\#", but all failed. The error message I received was:

+ vg deconstruct -e -a '\#' -P chm13 --snarls /data/minigraph_run/chm13_t2tctg_mgout.snarls /data/minigraph_run/chm13_t2tctg_mgout.gfa
error:[get_input_file_name] unable to open input file: \#
error[VPKG::load_one]: Could not open \# to determine file type

5. What data and command can the vg dev team use to make the problem happen? Human pangenome minigraph

6. What does running vg version say? v1.56.0

Place vg version output here
jeizenga commented 2 months ago

It looks like you are trying to provide # as an argument to -a, but that option doesn't take an argument. Because of that, # is being interpreted as a positional argument. The only positional argument that vg deconstruct takes is the graph itself, so it's trying to open # as the graph instead of ${OUTPUT_DIR}/chm13_t2tctg_mgout.gfa.

OZTaekOppa commented 2 months ago

@jeizenga

Thank you for your reply.

Following your suggestions, I used these two commands:

Command 1: vg deconstruct -e -a -t 4 --snarls ${OUTPUT_DIR}/chm13_t2tctg_mgout.snarls ${OUTPUT_DIR}/chm13_t2tctg_mgout.gfa > ${OUTPUT_DIR}/chm13_t2tctg_mgout.sv.lv.vcf

Command 2: vg deconstruct -e -a -t 4 -P chm13 --snarls ${OUTPUT_DIR}/chm13_t2tctg_mgout.snarls ${OUTPUT_DIR}/chm13_t2tctg_mgout.gfa > ${OUTPUT_DIR}/chm13_t2tctg_mgout.sv.lv.vcf

Both commands only generated meta-information lines. From the header and data lines, only the first header line was produced in the chm13_t2tctg_mgout.sv.lv.vcf file.

image

image

Did I miss something?

Kind regards,

Taek

jeizenga commented 2 months ago

I think I'll redirect this question to @glennhickey

glennhickey commented 2 months ago

You can't run vg deconstruct on minigraph output because it doesn't have the (non-reference) paths embedded in it. vg deconstruct needs the path information to work.

OZTaekOppa commented 2 months ago

@glennhickey, thanks for your reply. I will get back to you after testing it again with the PGGB gfa files.