Open yihangs opened 1 year ago
Pinging @cgroza who did this work.
Thank you so much! I have two follow-up questions:
call_peaks.nf
, and it seems that you didn't use vg filter
, but instead used graph_peak_caller count_unique_reads
. I don't quite understand this function, is it another way to do filtering? Hi,
I used the raw VCF. For the meaning of decomposed VCF, see https://github.com/human-pangenomics/hpp_pangenome_resources#vcf-decomposition
We chose not to filter the reads in this case, since many of the SVs have lower mappability with short reads libraries. For your use case, it may make sense to filter the reads. The number of unique reads is a parameter in GraphPeakCaller. Sometimes ChIP-seq will produce two identical reads, which this number excludes.
Thanks! I will try it. One more question:
vg mod --unfold
can also help convert graphs to DAG, Have you tried that? If so, did you meet any issues?I recall making an attempt then giving up. I did not try since then. The real solution would be to write a peak caller that generalized to cyclic graphs.
Cristian Groza ------- Original Message ------- On Monday, October 2nd, 2023 at 4:25 PM, yihangs @.***> wrote:
Thanks! I will try it. One more question.
- It seems that vg mod --unfold can also help convert graphs to DAG, Have you tried that? If so, did you meet any issues?
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
Hi,
I am reproducing the results according to https://github.com/cgroza/personalized_genomes_gbio/issues/1 via the following steps:
hprc-v1.1-mc-grch38.raw.vcf.gz
.python strip-nested.py hprc-v1.1-mc-grch38.raw.vcf.gz > hprc-v1.1-mc-grch38.raw.filter.vcf
hprc-v1.1-mc-grch38.raw.filter.vcf
vg construct -S -f -r hg38.fa -v hprc-v1.1-mc-grch38.raw.filter.vcf > graph.vg
However, an error occurs in the last step, with error messages: https://github.com/vgteam/vg/issues/4109.
Did you meet this error? According to the error messages, I suspect that it is because vg construct
cannot read super large SVs, hence, when you run strip-nested.py
, did you provide a threshold on the variable max_len
?
Thank you!
Yes I did.
Try 10 kbp.
Hi,
I use 10kbp to filter the vcf file, python strip-nested.py hprc-v1.1-mc-grch38.raw.vcf.gz 10000 > hprc-v1.1-mc-grch38.raw.filter.10kb.vcf
The vg construct
step works fine, however, an error occurs when I do the index vg index -x hprc-v1.1-mc-grch38-filter-10kb.xg -g hprc-v1.1-mc-grch38-filter-10kb.gcsa -b ./tmp/ -p hprc-v1.1-mc-grch38-filter-10kb.vg -t 20
with error message:
Building XG index
Saving XG index to hprc-v1.1-mc-grch38-filter-10kb.xg
Generating kmer files...
Building the GCSA2 index...
InputGraph::InputGraph(): 10793035616 kmers in 1 file(s)
InputGraph::read(): Read 10793035616 16-mers from ./tmp//vg-OBo2ha/vg-kmers-tmp-5QqyQ1
InputGraph::readKeys(): 1730614098 unique keys
InputGraph::read(): Read 10793035616 16-mers from ./tmp//vg-OBo2ha/vg-kmers-tmp-5QqyQ1
InputGraph::readFrom(): 8906588953 unique start nodes
InputGraph::read(): Read 10793035616 16-mers from ./tmp//vg-OBo2ha/vg-kmers-tmp-5QqyQ1
PathGraph::PathGraph(): 10793035616 paths with 21586071232 ranks
PathGraph::PathGraph(): 321.658 GB in 1 file(s)
GCSA::GCSA(): Preprocessing: 4328.06 seconds, 394.435 GB
GCSA::GCSA(): Prefix-doubling from path length 16
GCSA::GCSA(): Step 1 (path length 16 -> 32)
PathGraph::prune(): 10793035616 -> 10777627852 paths (1728649963 ranges)
PathGraph::prune(): 700183376 unique, 0 redundant, 10077437423 unsorted, 7053 nondeterministic paths
PathGraph::prune(): 321.198 GB in 1 file(s)
PathGraph::read(): File 0: Read 10777627852 order-16 paths
PathGraphBuilder::write(): Size limit exceeded, construction aborted
Looks like it is still a memory issue. Did you do any special hyper-parameter settings on the index step?
Thanks!
You may have more luck with vg autoindex, which is a built in function that will take a VCF and construct and index a graph for you.
Thanks! Which workflow do you use, giraffe or map? would giraffe index .gbz
more memory efficient?
I used map. Giraffe should work too.
Cristian Groza
Sent with Proton Mail secure email.
------- Original Message ------- On Friday, October 6th, 2023 at 11:02 AM, yihangs @.***> wrote:
Thanks! Which workflow do you use, giraffe or map? would giraffe index .gbz more memory efficient?
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
Hi,
I tried autoindex but still get an error when constructing GCSA:
[IndexRegistry]: Constructing GCSA/LCP indexes.
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
━━━━━━━━━━━━━━━━━━━━
Crash report for vg v1.49.0-9-ge61a8a475 "Peschici"
Stack trace (most recent call last):
#23 Object "", at 0xffffffffffffffff, in
#22 Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff383818f9, in _start
#21 Object "/lib/x86_64-linux-gnu/libc-2.27.so", at 0x7f2d4bf17b96, in __libc_start_main
Source "../csu/libc-start.c", line 310, in __libc_start_main [0x7f2d4bf17b96]
#20 Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff38355bea, in main
Source "src/main.cpp", line 88, in main [0x55ff38355bea]
#19 Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff38ac5c97, in vg::subcommand::Subcommand::operator()(int, char**) const
| Source "src/subcommand/subcommand.cpp", line 75, in operator()
Source "/usr/include/c++/9/bits/std_function.h", line 688, in operator() [0x55ff38ac5c97]
685: {
686: if (_M_empty())
687: __throw_bad_function_call();
> 688: return _M_invoker(_M_functor, std::forward<_ArgTypes>(__args)...);
689: }
690:
691: #if __cpp_rtti
#18 Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff38af3c5c, in main_autoindex(int, char**)
Source "src/subcommand/autoindex_main.cpp", line 360, in main_autoindex [0x55ff38af3c5c]
#17 Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff38fc398e, in vg::IndexRegistry::make_indexes(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)
Source "src/index_registry.cpp", line 4253, in make_indexes [0x55ff38fc398e]
#16 Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff38facc49, in vg::IndexRegistry::execute_recipe(std::pair<std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, unsigned long> const&, vg::IndexingPlan const*, vg::AliasGraph&)
| Source "src/index_registry.cpp", line 5204, in execute
| Source "src/index_registry.cpp", line 5369, in operator()
Source "/usr/include/c++/9/bits/std_function.h", line 688, in execute_recipe [0x55ff38facc49]
685: {
686: if (_M_empty())
687: __throw_bad_function_call();
> 688: return _M_invoker(_M_functor, std::forward<_ArgTypes>(__args)...);
689: }
690:
691: #if __cpp_rtti
#15 Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff38fa7759, in std::_Function_handler<std::vector<std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > (std::vector<vg::IndexFile const*, std::allocator<vg::IndexFile const*> > const&, vg::IndexingPlan const*, vg::AliasGraph&, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&), vg::VGIndexes::get_vg_index_registry()::{lambda(std::vector<vg::IndexFile const*, std::allocator<vg::IndexFile const*> > const&, vg::IndexingPlan const*, vg::AliasGraph&, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)#51}>::_M_invoke(std::_Any_data const&, std::vector<vg::IndexFile const*, std::allocator<vg::IndexFile const*> > const&, vg::IndexingPlan const*&&, vg::AliasGraph&, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)
| Source "/usr/include/c++/9/bits/std_function.h", line 286, in operator()
| 284: {
| 285: return (*_Base::_M_get_pointer(__functor))(
| > 286: std::forward<_ArgTypes>(__args)...);
| 287: }
| 288: };
Source "src/index_registry.cpp", line 3651, in _M_invoke [0x55ff38fa7759]
#14 Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff38fa7397, in vg::VGIndexes::get_vg_index_registry()::{lambda(std::vector<vg::IndexFile const*, std::allocator<vg::IndexFile const*> > const&, vg::IndexingPlan const*, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)#50}::operator()(std::vector<vg::IndexFile const*, std::allocator<vg::IndexFile const*> > const&, vg::IndexingPlan const*, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) const [clone .isra.0]
Source "src/index_registry.cpp", line 3601, in operator() [0x55ff38fa7397]
#13 Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff38f96f25, in vg::execute_in_fork(std::function<void ()> const&)
| Source "src/index_registry.cpp", line 408, in operator()
Source "/usr/include/c++/9/bits/std_function.h", line 688, in execute_in_fork [0x55ff38f96f25]
685: {
686: if (_M_empty())
687: __throw_bad_function_call();
> 688: return _M_invoker(_M_functor, std::forward<_ArgTypes>(__args)...);
689: }
690:
691: #if __cpp_rtti
#12 Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff38f94c98, in vg::VGIndexes::get_vg_index_registry()::{lambda(std::vector<vg::IndexFile const*, std::allocator<vg::IndexFile const*> > const&, vg::IndexingPlan const*, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)#50}::operator()(std::vector<vg::IndexFile const*, std::allocator<vg::IndexFile const*> > const&, vg::IndexingPlan const*, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) const::{lambda()#1}::operator()() const
Source "src/index_registry.cpp", line 3613, in operator() [0x55ff38f94c98]
#11 Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff39247b28, in gcsa::GCSA::GCSA(gcsa::InputGraph&, gcsa::ConstructionParameters const&)
Source "src/gcsa.cpp", line 505, in GCSA [0x55ff39247b28]
#10 Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff3927da61, in gcsa::PathGraph::extend(unsigned long)
Source "src/path_graph.cpp", line 1011, in extend [0x55ff3927da61]
#9 Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff3927cc60, in gcsa::PathGraphBuilder::sort(unsigned long)
Source "src/path_graph.cpp", line 405, in sort [0x55ff3927cc60]
#8 Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff39279794, in gcsa::PathGraph::read(std::vector<gcsa::PathNode, std::allocator<gcsa::PathNode> >&, std::vector<unsigned int, std::allocator<unsigned int> >&, unsigned long) const
| Source "src/path_graph.cpp", line 1079, in resize
Source "/usr/include/c++/7/bits/stl_vector.h", line 692, in read [0x55ff39279794]
689: resize(size_type __new_size)
690: {
691: if (__new_size > size())
> 692: _M_default_append(__new_size - size());
693: else if (__new_size < size())
694: _M_erase_at_end(this->_M_impl._M_start + __new_size);
695: }
#7 Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff38cca438, in std::vector<unsigned int, std::allocator<unsigned int> >::_M_default_append(unsigned long)
| Source "/usr/include/c++/9/bits/vector.tcc", line 635, in _M_allocate
| 633: const size_type __len =
| 634: _M_check_len(__n, "vector::_M_default_append");
| > 635: pointer __new_start(this->_M_allocate(__len));
| 636: if _GLIBCXX17_CONSTEXPR (_S_use_relocate())
| 637: {
| Source "/usr/include/c++/9/bits/stl_vector.h", line 343, in allocate
| 341: {
| 342: typedef __gnu_cxx::__alloc_traits<_Tp_alloc_type> _Tr;
| > 343: return __n != 0 ? _Tr::allocate(_M_impl, __n) : pointer();
| 344: }
| Source "/usr/include/c++/9/bits/alloc_traits.h", line 443, in allocate
| 441: _GLIBCXX_NODISCARD static pointer
| 442: allocate(allocator_type& __a, size_type __n)
| > 443: { return __a.allocate(__n); }
| 444:
| 445: /**
Source "/usr/include/c++/9/ext/new_allocator.h", line 114, in _M_default_append [0x55ff38cca438]
111: return static_cast<_Tp*>(::operator new(__n * sizeof(_Tp), __al));
112: }
113: #endif
> 114: return static_cast<_Tp*>(::operator new(__n * sizeof(_Tp)));
115: }
116:
117: // __p is not permitted to be a null pointer.
#6 Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.29", at 0x7f2d4c7decea, in
#5 Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.29", at 0x7f2d4c7ea7f4, in __cxa_throw
#4 Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.29", at 0x7f2d4c7ea570, in std::terminate()
#3 Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.29", at 0x7f2d4c7ea505, in
#2 Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.29", at 0x7f2d4c7df0a8, in
#1 Object "/lib/x86_64-linux-gnu/libc-2.27.so", at 0x7f2d4bf36800, in abort
Source "/build/glibc-OTsEL5/glibc-2.27/stdlib/abort.c", line 79, in abort [0x7f2d4bf36800]
#0 Object "/lib/x86_64-linux-gnu/libc-2.27.so", at 0x7f2d4bf34e97, in raise
Source "../sysdeps/unix/sysv/linux/raise.c", line 51, in raise [0x7f2d4bf34e97]
ERROR: Signal 6 occurred. VG has crashed. Visit https://github.com/vgteam/vg/issues/new/choose to report a bug.
Please include this entire error log in your bug report!
━━━━━━━━━━━━━━━━━━━━
But then the program captures this error, does the pruning step and constructs the index again.
warning:[IndexRegistry] Child process 37623 failed with status 34304 representing exit code 134
[IndexRegistry]: Exceeded disk use limit while performing k-mer doubling steps. Rewinding to pruning step with more aggressive pruning to simplify the graph.
[IndexRegistry]: Pruning complex regions of VG to prepare for GCSA indexing with GBWT unfolding.
[IndexRegistry]: Constructing GCSA/LCP indexes.
It runs slow, and haven't finish with three days.
Did you also meet this situation? Or maybe you set some special hyper-parameters for vg autoindex
to overcome this error? (one hyper-parameter that might be related is -M --target-mem
, I use the default setting:1/2 of available, which is around 500G in my case, I am not sure if I need to increase it)
Thanks!
The Nature paper says that by using the software graph_peak_caller, we can do ChiP-Seq analysis on genome graphs. However, the original graph_peaker_caller paper says "Graph Peak Caller was developed by extending the methodologies and concepts from MACS2 to directed acyclic graphs (DAGs). The MACS2 algorithm can be divided into five steps... We have adopted each of these steps to work on DAGs", indicating that graph_peak_caller will only work on DAGs. Therefore, I have two questions:
Thanks!