uio-bmi / graph_peak_caller

ChIP-seq peak caller for reads mapped to a graph-based reference genome
BSD 3-Clause "New" or "Revised" License
19 stars 8 forks source link

Error in creating linear map when in ATAC-seq mode #9

Open huangyizR opened 4 years ago

huangyizR commented 4 years ago

Hi, I am trying the graph_peak_caller in ATAC-seq mode as you suggested in #8 . I am following the pipeline in vg to construct the variation graph. I worked on the mouse genome, in which case 21 graphs for each chromosome was created. But I still expensed the same kinds of error as shown in #8, in the creating learn map step for most of the chromosomes (only Chr9, ChrX and ChrY work fine) :

[peter.huang@node070 peak]$ graph_peak_caller callpeaks -a True -g /secondary/projects/immunograph/vg/scripts/indexGCSA/graphs/chr1.nobg -s /secondary/projects/immunograph/vg/results/gam/10B_L000_Interleaves_chr1.json -n "" -f 150 -r 50 -p True -u $unique_reads -G 2500000000 2020-09-15 10:09:25,605, INFO: Sample files: ['/secondary/projects/immunograph/vg/results/gam/10B_L000_Interleaves_chr1.json'] 2020-09-15 10:09:25,605, INFO: Using graphs: ['/secondary/projects/immunograph/vg/scripts/indexGCSA/graphs/chr1.nobg'] 2020-09-15 10:09:25,605, INFO: Will use sequence graphs. ['/secondary/projects/immunograph/vg/scripts/indexGCSA/graphs/chr1.nobg.sequences'] 2020-09-15 10:09:25,605, INFO: Using graphs from data directory /secondary/projects/immunograph/vg/scripts/indexGCSA/graphs 2020-09-15 10:09:25,605, INFO: Will use [''] as extra experiments names for each run, based on graph file names.If only running on single graph, this should be empty. 2020-09-15 10:09:25,608, WARNING: Did not find linear map for for graph /secondary/projects/immunograph/vg/scripts/indexGCSA/graphs/chr1.nobg. Will create. 2020-09-15 10:09:25,608, WARNING: Did not find linear map for for graph /secondary/projects/immunograph/vg/scripts/indexGCSA/graphs/chr1.nobg. Will create. 2020-09-15 10:09:29,838, INFO: Getting topologically sorted nodes 2020-09-15 10:09:29,839, INFO: 0 nodes processed 2020-09-15 10:09:29,839, INFO: Finding starts and ends 2020-09-15 10:09:29,839, INFO: 0 nodes processed Traceback (most recent call last): File "/secondary/projects/triche/tools/anaconda2/envs/r_env/bin/graph_peak_caller", line 11, in <module> load_entry_point('graph-peak-caller', 'console_scripts', 'graph_peak_caller')() File "/secondary/projects/triche/Peter/tools/graph_peak_caller/graph_peak_caller/command_line_interface.py", line 39, in main run_argument_parser(sys.argv[1:]) File "/secondary/projects/triche/Peter/tools/graph_peak_caller/graph_peak_caller/command_line_interface.py", line 679, in run_argument_parser args.func(args) File "/secondary/projects/triche/Peter/tools/graph_peak_caller/graph_peak_caller/callpeaks_interface.py", line 182, in run_callpeaks2 create_linear_map(graph, linear_map_name) File "/secondary/projects/triche/Peter/tools/graph_peak_caller/graph_peak_caller/util.py", line 6, in create_linear_map linear_map = LinearMap.from_graph(ob_graph) File "/secondary/projects/triche/Peter/tools/graph_peak_caller/graph_peak_caller/control/linearmap.py", line 106, in from_graph starts = cls.find_starts(graph, node_ids) File "/secondary/projects/triche/Peter/tools/graph_peak_caller/graph_peak_caller/control/linearmap.py", line 134, in find_starts max_dists[j] = max(cur_dist, max_dists[j]) IndexError: index -21804899 is out of bounds for axis 0 with size 3

In #8, you suggest that the error may be caused by a disconnected graph. But because I was using vg, I doubt that is the case, although I am not so sure about it. I wonder how you manage to find the disconnection in graph as you comment in #8 :

" For instance, node 133450827 has an edge going to 135833768, but these two nodes don't have any other edges to any nodes, so they are isolated together."

And also, I wonder if you any further suggestions about this error. Thank you so much!

Best, Peter

@ttriche

cgroza commented 3 months ago

This error was introduced after VG made slight changes in their JSON format output, representing nodes as strings instead of integers. There was a fix introduced. Did you try the latest version?