Hi,
I am trying the graph_peak_caller in ATAC-seq mode as you suggested in #8 . I am following the pipeline in vg to construct the variation graph. I worked on the mouse genome, in which case 21 graphs for each chromosome was created. But I still expensed the same kinds of error as shown in #8, in the creating learn map step for most of the chromosomes (only Chr9, ChrX and ChrY work fine) :
[peter.huang@node070 peak]$ graph_peak_caller callpeaks -a True -g /secondary/projects/immunograph/vg/scripts/indexGCSA/graphs/chr1.nobg -s /secondary/projects/immunograph/vg/results/gam/10B_L000_Interleaves_chr1.json -n "" -f 150 -r 50 -p True -u $unique_reads -G 2500000000 2020-09-15 10:09:25,605, INFO: Sample files: ['/secondary/projects/immunograph/vg/results/gam/10B_L000_Interleaves_chr1.json'] 2020-09-15 10:09:25,605, INFO: Using graphs: ['/secondary/projects/immunograph/vg/scripts/indexGCSA/graphs/chr1.nobg'] 2020-09-15 10:09:25,605, INFO: Will use sequence graphs. ['/secondary/projects/immunograph/vg/scripts/indexGCSA/graphs/chr1.nobg.sequences'] 2020-09-15 10:09:25,605, INFO: Using graphs from data directory /secondary/projects/immunograph/vg/scripts/indexGCSA/graphs 2020-09-15 10:09:25,605, INFO: Will use [''] as extra experiments names for each run, based on graph file names.If only running on single graph, this should be empty. 2020-09-15 10:09:25,608, WARNING: Did not find linear map for for graph /secondary/projects/immunograph/vg/scripts/indexGCSA/graphs/chr1.nobg. Will create. 2020-09-15 10:09:25,608, WARNING: Did not find linear map for for graph /secondary/projects/immunograph/vg/scripts/indexGCSA/graphs/chr1.nobg. Will create. 2020-09-15 10:09:29,838, INFO: Getting topologically sorted nodes 2020-09-15 10:09:29,839, INFO: 0 nodes processed 2020-09-15 10:09:29,839, INFO: Finding starts and ends 2020-09-15 10:09:29,839, INFO: 0 nodes processed Traceback (most recent call last): File "/secondary/projects/triche/tools/anaconda2/envs/r_env/bin/graph_peak_caller", line 11, in <module> load_entry_point('graph-peak-caller', 'console_scripts', 'graph_peak_caller')() File "/secondary/projects/triche/Peter/tools/graph_peak_caller/graph_peak_caller/command_line_interface.py", line 39, in main run_argument_parser(sys.argv[1:]) File "/secondary/projects/triche/Peter/tools/graph_peak_caller/graph_peak_caller/command_line_interface.py", line 679, in run_argument_parser args.func(args) File "/secondary/projects/triche/Peter/tools/graph_peak_caller/graph_peak_caller/callpeaks_interface.py", line 182, in run_callpeaks2 create_linear_map(graph, linear_map_name) File "/secondary/projects/triche/Peter/tools/graph_peak_caller/graph_peak_caller/util.py", line 6, in create_linear_map linear_map = LinearMap.from_graph(ob_graph) File "/secondary/projects/triche/Peter/tools/graph_peak_caller/graph_peak_caller/control/linearmap.py", line 106, in from_graph starts = cls.find_starts(graph, node_ids) File "/secondary/projects/triche/Peter/tools/graph_peak_caller/graph_peak_caller/control/linearmap.py", line 134, in find_starts max_dists[j] = max(cur_dist, max_dists[j]) IndexError: index -21804899 is out of bounds for axis 0 with size 3
In #8, you suggest that the error may be caused by a disconnected graph. But because I was using vg, I doubt that is the case, although I am not so sure about it. I wonder how you manage to find the disconnection in graph as you comment in #8 :
" For instance, node 133450827 has an edge going to 135833768, but these two nodes don't have any other edges to any nodes, so they are isolated together."
And also, I wonder if you any further suggestions about this error. Thank you so much!
This error was introduced after VG made slight changes in their JSON format output, representing nodes as strings instead of integers. There was a fix introduced. Did you try the latest version?
Hi, I am trying the graph_peak_caller in ATAC-seq mode as you suggested in #8 . I am following the pipeline in vg to construct the variation graph. I worked on the mouse genome, in which case 21 graphs for each chromosome was created. But I still expensed the same kinds of error as shown in #8, in the creating learn map step for most of the chromosomes (only Chr9, ChrX and ChrY work fine) :
[peter.huang@node070 peak]$ graph_peak_caller callpeaks -a True -g /secondary/projects/immunograph/vg/scripts/indexGCSA/graphs/chr1.nobg -s /secondary/projects/immunograph/vg/results/gam/10B_L000_Interleaves_chr1.json -n "" -f 150 -r 50 -p True -u $unique_reads -G 2500000000 2020-09-15 10:09:25,605, INFO: Sample files: ['/secondary/projects/immunograph/vg/results/gam/10B_L000_Interleaves_chr1.json'] 2020-09-15 10:09:25,605, INFO: Using graphs: ['/secondary/projects/immunograph/vg/scripts/indexGCSA/graphs/chr1.nobg'] 2020-09-15 10:09:25,605, INFO: Will use sequence graphs. ['/secondary/projects/immunograph/vg/scripts/indexGCSA/graphs/chr1.nobg.sequences'] 2020-09-15 10:09:25,605, INFO: Using graphs from data directory /secondary/projects/immunograph/vg/scripts/indexGCSA/graphs 2020-09-15 10:09:25,605, INFO: Will use [''] as extra experiments names for each run, based on graph file names.If only running on single graph, this should be empty. 2020-09-15 10:09:25,608, WARNING: Did not find linear map for for graph /secondary/projects/immunograph/vg/scripts/indexGCSA/graphs/chr1.nobg. Will create. 2020-09-15 10:09:25,608, WARNING: Did not find linear map for for graph /secondary/projects/immunograph/vg/scripts/indexGCSA/graphs/chr1.nobg. Will create. 2020-09-15 10:09:29,838, INFO: Getting topologically sorted nodes 2020-09-15 10:09:29,839, INFO: 0 nodes processed 2020-09-15 10:09:29,839, INFO: Finding starts and ends 2020-09-15 10:09:29,839, INFO: 0 nodes processed Traceback (most recent call last): File "/secondary/projects/triche/tools/anaconda2/envs/r_env/bin/graph_peak_caller", line 11, in <module> load_entry_point('graph-peak-caller', 'console_scripts', 'graph_peak_caller')() File "/secondary/projects/triche/Peter/tools/graph_peak_caller/graph_peak_caller/command_line_interface.py", line 39, in main run_argument_parser(sys.argv[1:]) File "/secondary/projects/triche/Peter/tools/graph_peak_caller/graph_peak_caller/command_line_interface.py", line 679, in run_argument_parser args.func(args) File "/secondary/projects/triche/Peter/tools/graph_peak_caller/graph_peak_caller/callpeaks_interface.py", line 182, in run_callpeaks2 create_linear_map(graph, linear_map_name) File "/secondary/projects/triche/Peter/tools/graph_peak_caller/graph_peak_caller/util.py", line 6, in create_linear_map linear_map = LinearMap.from_graph(ob_graph) File "/secondary/projects/triche/Peter/tools/graph_peak_caller/graph_peak_caller/control/linearmap.py", line 106, in from_graph starts = cls.find_starts(graph, node_ids) File "/secondary/projects/triche/Peter/tools/graph_peak_caller/graph_peak_caller/control/linearmap.py", line 134, in find_starts max_dists[j] = max(cur_dist, max_dists[j]) IndexError: index -21804899 is out of bounds for axis 0 with size 3
In #8, you suggest that the error may be caused by a disconnected graph. But because I was using vg, I doubt that is the case, although I am not so sure about it. I wonder how you manage to find the disconnection in graph as you comment in #8 :
" For instance, node 133450827 has an edge going to 135833768, but these two nodes don't have any other edges to any nodes, so they are isolated together."
And also, I wonder if you any further suggestions about this error. Thank you so much!
Best, Peter
@ttriche