uio-bmi / graph_peak_caller

ChIP-seq peak caller for reads mapped to a graph-based reference genome
BSD 3-Clause "New" or "Revised" License
18 stars 8 forks source link

create_ob_graph failure #11

Open byoo opened 3 years ago

byoo commented 3 years ago

Hello, I would like to ask your advice on creating a offset-based graph using the create_ob_graph. I wonder if you could guide how to resolve the error below. The input json for the create_ob_graph is from a vg file that is converted from a gfa file created by minigraph.

2021-01-28 01:43:03,592, INFO: Setting sequences using vg json graph graph_p.json
Traceback (most recent call last):
  File "graph_peak_caller", line 8, in <module>
    sys.exit(main())
  File "graph_peak_caller/command_line_interface.py", line 36, in main
    run_argument_parser(sys.argv[1:])
  File "graph_peak_caller/command_line_interface.py", line 673, in run_argument_parser
    args.func(args)
  File "graph_peak_caller/preprocess_interface.py", line 67, in create_ob_graph
    sequence_graph.set_sequences_using_vg_json_graph(args.vg_json_file_name)
  File "offsetbasedgraph/sequencegraph.py", line 71, in set_sequences_using_vg_json_graph
    self.set_sequence(int(node_object["id"]), node_object["sequence"])
  File "offsetbasedgraph/sequencegraph.py", line 94, in set_sequence
    assert node_size == len(sequence), "Invalid sequence. Does not cover whole node"
AssertionError: Invalid sequence. Does not cover whole node
ivargr commented 3 years ago

Hi!

It seems that it crashes because it thinks there is a node in the graph having a sequence that doesn't match the node size.

Would you be able to send med the vg graph your are using, and I could check whether there is an error in the code or something wrong with the graph?

Thanks!

byoo commented 3 years ago

Thanks for the quick reply! What’s med? Do you mean the m

On Thu, Jan 28, 2021 at 1:44 PM ivargr notifications@github.com wrote:

Hi!

It seems that it crashes because it thinks there is a node in the graph having a sequence that doesn't match the node size.

Would you be able to send med the vg graph your are using, and I could check whether there is an error in the code or something wrong with the graph?

Thanks!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/uio-bmi/graph_peak_caller/issues/11#issuecomment-769330024, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIPUFWFOYJHWIHYJODIUETS4G5BVANCNFSM4WXK4ALQ .

ivargr commented 3 years ago

Sorry, I had a typo there, I meant to ask "Would you be able to send me the vg graph you are using...?". I see you have a file called graph_p.json, you could alternatively just send me that (but I guess you also have a .vg-file that is smaller).

byoo commented 3 years ago

Sorry I mistakenly sent the message. Yes, you are right. BTW, even vg file is over 3gb in size so I wonder if it is possible to find the data causing the error and extract it. I am new to work with vg file. I'd appreciate if you guide me. Thank you.

ivargr commented 3 years ago

It is a bit tricky without having the graph, since it seems like there might be an error in the graph. Maybe you could explain the steps/pipeline you used to create the graph, and I can see if I can understand how you got the error from there?

byoo commented 3 years ago

The steps to create graph are 1) perform de novo assembly using hifiasm, 2) build gfa using minigraph, 3) convert gfa to vg to json using vg. The error occurs in a small subset of the graph here. Thanks!

Sorry I just read that everything in the graph needs to be connected. The graph includes all the chromosomes. It may be part of the issue.