human-pangenomics / hpp_pangenome_resources

94 stars 3 forks source link

Analyzing Chip-seq data on genome graph #22

Open yihangs opened 11 months ago

yihangs commented 11 months ago

The Nature paper says that by using the software graph_peak_caller, we can do ChiP-Seq analysis on genome graphs. However, the original graph_peaker_caller paper says "Graph Peak Caller was developed by extending the methodologies and concepts from MACS2 to directed acyclic graphs (DAGs). The MACS2 algorithm can be divided into five steps... We have adopted each of these steps to work on DAGs", indicating that graph_peak_caller will only work on DAGs. Therefore, I have two questions:

  1. Did you try graph_peak_caller on DAG genome graphs or non-DAG genome graphs? If it is used on non-DAG genome graphs, do you think graph_peak_caller still works well?
  2. If it is used on DAG genome graphs, what kind of methods you used for converting original non-DAG genome graphs to DAGs?

Thanks!

glennhickey commented 11 months ago

Pinging @cgroza who did this work.

cgroza commented 11 months ago

Hi,

Please see

https://github.com/cgroza/personalized_genomes_gbio/issues/1

yihangs commented 11 months ago

Thank you so much! I have two follow-up questions:

  1. Did you use a decomposed VCF file of minigraph-cactus graph or Raw VCF? What are their differences?
  2. Did you use any filtering steps on mapped reads? I checked the code call_peaks.nf, and it seems that you didn't use vg filter, but instead used graph_peak_caller count_unique_reads. I don't quite understand this function, is it another way to do filtering?
cgroza commented 11 months ago

Hi,

  1. I used the raw VCF. For the meaning of decomposed VCF, see https://github.com/human-pangenomics/hpp_pangenome_resources#vcf-decomposition

  2. We chose not to filter the reads in this case, since many of the SVs have lower mappability with short reads libraries. For your use case, it may make sense to filter the reads. The number of unique reads is a parameter in GraphPeakCaller. Sometimes ChIP-seq will produce two identical reads, which this number excludes.

yihangs commented 11 months ago

Thanks! I will try it. One more question:

  1. It seems that vg mod --unfold can also help convert graphs to DAG, Have you tried that? If so, did you meet any issues?
cgroza commented 11 months ago

I recall making an attempt then giving up. I did not try since then. The real solution would be to write a peak caller that generalized to cyclic graphs.

Cristian Groza ------- Original Message ------- On Monday, October 2nd, 2023 at 4:25 PM, yihangs @.***> wrote:

Thanks! I will try it. One more question.

  • It seems that vg mod --unfold can also help convert graphs to DAG, Have you tried that? If so, did you meet any issues?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

yihangs commented 11 months ago

Hi,

I am reproducing the results according to https://github.com/cgroza/personalized_genomes_gbio/issues/1 via the following steps:

  1. download the raw vcf file, hprc-v1.1-mc-grch38.raw.vcf.gz.
  2. run the script python strip-nested.py hprc-v1.1-mc-grch38.raw.vcf.gz > hprc-v1.1-mc-grch38.raw.filter.vcf
  3. bgzip and index hprc-v1.1-mc-grch38.raw.filter.vcf
  4. run vg construct -S -f -r hg38.fa -v hprc-v1.1-mc-grch38.raw.filter.vcf > graph.vg

However, an error occurs in the last step, with error messages: https://github.com/vgteam/vg/issues/4109.

Did you meet this error? According to the error messages, I suspect that it is because vg construct cannot read super large SVs, hence, when you run strip-nested.py, did you provide a threshold on the variable max_len?

Thank you!

cgroza commented 11 months ago

Yes I did.

Try 10 kbp.

yihangs commented 11 months ago

Hi,

I use 10kbp to filter the vcf file, python strip-nested.py hprc-v1.1-mc-grch38.raw.vcf.gz 10000 > hprc-v1.1-mc-grch38.raw.filter.10kb.vcf

The vg construct step works fine, however, an error occurs when I do the index vg index -x hprc-v1.1-mc-grch38-filter-10kb.xg -g hprc-v1.1-mc-grch38-filter-10kb.gcsa -b ./tmp/ -p hprc-v1.1-mc-grch38-filter-10kb.vg -t 20 with error message:

Building XG index
Saving XG index to hprc-v1.1-mc-grch38-filter-10kb.xg
Generating kmer files...
Building the GCSA2 index...
InputGraph::InputGraph(): 10793035616 kmers in 1 file(s)
InputGraph::read(): Read 10793035616 16-mers from ./tmp//vg-OBo2ha/vg-kmers-tmp-5QqyQ1
InputGraph::readKeys(): 1730614098 unique keys
InputGraph::read(): Read 10793035616 16-mers from ./tmp//vg-OBo2ha/vg-kmers-tmp-5QqyQ1
InputGraph::readFrom(): 8906588953 unique start nodes
InputGraph::read(): Read 10793035616 16-mers from ./tmp//vg-OBo2ha/vg-kmers-tmp-5QqyQ1
PathGraph::PathGraph(): 10793035616 paths with 21586071232 ranks
PathGraph::PathGraph(): 321.658 GB in 1 file(s)
GCSA::GCSA(): Preprocessing: 4328.06 seconds, 394.435 GB
GCSA::GCSA(): Prefix-doubling from path length 16
GCSA::GCSA(): Step 1 (path length 16 -> 32)
PathGraph::prune(): 10793035616 -> 10777627852 paths (1728649963 ranges)
PathGraph::prune(): 700183376 unique, 0 redundant, 10077437423 unsorted, 7053 nondeterministic paths
PathGraph::prune(): 321.198 GB in 1 file(s)
PathGraph::read(): File 0: Read 10777627852 order-16 paths
PathGraphBuilder::write(): Size limit exceeded, construction aborted

Looks like it is still a memory issue. Did you do any special hyper-parameter settings on the index step?

Thanks!

cgroza commented 11 months ago

You may have more luck with vg autoindex, which is a built in function that will take a VCF and construct and index a graph for you.

yihangs commented 11 months ago

Thanks! Which workflow do you use, giraffe or map? would giraffe index .gbz more memory efficient?

cgroza commented 11 months ago

I used map. Giraffe should work too.

Cristian Groza

Sent with Proton Mail secure email.

------- Original Message ------- On Friday, October 6th, 2023 at 11:02 AM, yihangs @.***> wrote:

Thanks! Which workflow do you use, giraffe or map? would giraffe index .gbz more memory efficient?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

yihangs commented 11 months ago

Hi,

I tried autoindex but still get an error when constructing GCSA:

[IndexRegistry]: Constructing GCSA/LCP indexes.
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
━━━━━━━━━━━━━━━━━━━━
Crash report for vg v1.49.0-9-ge61a8a475 "Peschici"
Stack trace (most recent call last):
#23   Object "", at 0xffffffffffffffff, in 
#22   Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff383818f9, in _start
#21   Object "/lib/x86_64-linux-gnu/libc-2.27.so", at 0x7f2d4bf17b96, in __libc_start_main
      Source "../csu/libc-start.c", line 310, in __libc_start_main [0x7f2d4bf17b96]
#20   Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff38355bea, in main
      Source "src/main.cpp", line 88, in main [0x55ff38355bea]
#19   Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff38ac5c97, in vg::subcommand::Subcommand::operator()(int, char**) const
    | Source "src/subcommand/subcommand.cpp", line 75, in operator()
      Source "/usr/include/c++/9/bits/std_function.h", line 688, in operator() [0x55ff38ac5c97]
        685:     {
        686:       if (_M_empty())
        687:    __throw_bad_function_call();
      > 688:       return _M_invoker(_M_functor, std::forward<_ArgTypes>(__args)...);
        689:     }
        690: 
        691: #if __cpp_rtti
#18   Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff38af3c5c, in main_autoindex(int, char**)
      Source "src/subcommand/autoindex_main.cpp", line 360, in main_autoindex [0x55ff38af3c5c]
#17   Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff38fc398e, in vg::IndexRegistry::make_indexes(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)
      Source "src/index_registry.cpp", line 4253, in make_indexes [0x55ff38fc398e]
#16   Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff38facc49, in vg::IndexRegistry::execute_recipe(std::pair<std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, unsigned long> const&, vg::IndexingPlan const*, vg::AliasGraph&)
    | Source "src/index_registry.cpp", line 5204, in execute
    | Source "src/index_registry.cpp", line 5369, in operator()
      Source "/usr/include/c++/9/bits/std_function.h", line 688, in execute_recipe [0x55ff38facc49]
        685:     {
        686:       if (_M_empty())
        687:    __throw_bad_function_call();
      > 688:       return _M_invoker(_M_functor, std::forward<_ArgTypes>(__args)...);
        689:     }
        690: 
        691: #if __cpp_rtti
#15   Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff38fa7759, in std::_Function_handler<std::vector<std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > (std::vector<vg::IndexFile const*, std::allocator<vg::IndexFile const*> > const&, vg::IndexingPlan const*, vg::AliasGraph&, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&), vg::VGIndexes::get_vg_index_registry()::{lambda(std::vector<vg::IndexFile const*, std::allocator<vg::IndexFile const*> > const&, vg::IndexingPlan const*, vg::AliasGraph&, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)#51}>::_M_invoke(std::_Any_data const&, std::vector<vg::IndexFile const*, std::allocator<vg::IndexFile const*> > const&, vg::IndexingPlan const*&&, vg::AliasGraph&, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)
    | Source "/usr/include/c++/9/bits/std_function.h", line 286, in operator()
    |   284:       {
    |   285:    return (*_Base::_M_get_pointer(__functor))(
    | > 286:        std::forward<_ArgTypes>(__args)...);
    |   287:       }
    |   288:     };
      Source "src/index_registry.cpp", line 3651, in _M_invoke [0x55ff38fa7759]
#14   Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff38fa7397, in vg::VGIndexes::get_vg_index_registry()::{lambda(std::vector<vg::IndexFile const*, std::allocator<vg::IndexFile const*> > const&, vg::IndexingPlan const*, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)#50}::operator()(std::vector<vg::IndexFile const*, std::allocator<vg::IndexFile const*> > const&, vg::IndexingPlan const*, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) const [clone .isra.0]
      Source "src/index_registry.cpp", line 3601, in operator() [0x55ff38fa7397]
#13   Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff38f96f25, in vg::execute_in_fork(std::function<void ()> const&)
    | Source "src/index_registry.cpp", line 408, in operator()
      Source "/usr/include/c++/9/bits/std_function.h", line 688, in execute_in_fork [0x55ff38f96f25]
        685:     {
        686:       if (_M_empty())
        687:    __throw_bad_function_call();
      > 688:       return _M_invoker(_M_functor, std::forward<_ArgTypes>(__args)...);
        689:     }
        690: 
        691: #if __cpp_rtti
#12   Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff38f94c98, in vg::VGIndexes::get_vg_index_registry()::{lambda(std::vector<vg::IndexFile const*, std::allocator<vg::IndexFile const*> > const&, vg::IndexingPlan const*, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)#50}::operator()(std::vector<vg::IndexFile const*, std::allocator<vg::IndexFile const*> > const&, vg::IndexingPlan const*, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) const::{lambda()#1}::operator()() const
      Source "src/index_registry.cpp", line 3613, in operator() [0x55ff38f94c98]
#11   Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff39247b28, in gcsa::GCSA::GCSA(gcsa::InputGraph&, gcsa::ConstructionParameters const&)
      Source "src/gcsa.cpp", line 505, in GCSA [0x55ff39247b28]
#10   Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff3927da61, in gcsa::PathGraph::extend(unsigned long)
      Source "src/path_graph.cpp", line 1011, in extend [0x55ff3927da61]
#9    Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff3927cc60, in gcsa::PathGraphBuilder::sort(unsigned long)
      Source "src/path_graph.cpp", line 405, in sort [0x55ff3927cc60]
#8    Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff39279794, in gcsa::PathGraph::read(std::vector<gcsa::PathNode, std::allocator<gcsa::PathNode> >&, std::vector<unsigned int, std::allocator<unsigned int> >&, unsigned long) const
    | Source "src/path_graph.cpp", line 1079, in resize
      Source "/usr/include/c++/7/bits/stl_vector.h", line 692, in read [0x55ff39279794]
        689:       resize(size_type __new_size)
        690:       {
        691:    if (__new_size > size())
      > 692:      _M_default_append(__new_size - size());
        693:    else if (__new_size < size())
        694:      _M_erase_at_end(this->_M_impl._M_start + __new_size);
        695:       }
#7    Object "/opt/local/stow/vg-1.49.0/bin/vg", at 0x55ff38cca438, in std::vector<unsigned int, std::allocator<unsigned int> >::_M_default_append(unsigned long)
    | Source "/usr/include/c++/9/bits/vector.tcc", line 635, in _M_allocate
    |   633:          const size_type __len =
    |   634:        _M_check_len(__n, "vector::_M_default_append");
    | > 635:          pointer __new_start(this->_M_allocate(__len));
    |   636:          if _GLIBCXX17_CONSTEXPR (_S_use_relocate())
    |   637:        {
    | Source "/usr/include/c++/9/bits/stl_vector.h", line 343, in allocate
    |   341:       {
    |   342:    typedef __gnu_cxx::__alloc_traits<_Tp_alloc_type> _Tr;
    | > 343:    return __n != 0 ? _Tr::allocate(_M_impl, __n) : pointer();
    |   344:       }
    | Source "/usr/include/c++/9/bits/alloc_traits.h", line 443, in allocate
    |   441:       _GLIBCXX_NODISCARD static pointer
    |   442:       allocate(allocator_type& __a, size_type __n)
    | > 443:       { return __a.allocate(__n); }
    |   444: 
    |   445:       /**
      Source "/usr/include/c++/9/ext/new_allocator.h", line 114, in _M_default_append [0x55ff38cca438]
        111:        return static_cast<_Tp*>(::operator new(__n * sizeof(_Tp), __al));
        112:      }
        113: #endif
      > 114:    return static_cast<_Tp*>(::operator new(__n * sizeof(_Tp)));
        115:       }
        116: 
        117:       // __p is not permitted to be a null pointer.
#6    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.29", at 0x7f2d4c7decea, in 
#5    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.29", at 0x7f2d4c7ea7f4, in __cxa_throw
#4    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.29", at 0x7f2d4c7ea570, in std::terminate()
#3    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.29", at 0x7f2d4c7ea505, in 
#2    Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.29", at 0x7f2d4c7df0a8, in 
#1    Object "/lib/x86_64-linux-gnu/libc-2.27.so", at 0x7f2d4bf36800, in abort
      Source "/build/glibc-OTsEL5/glibc-2.27/stdlib/abort.c", line 79, in abort [0x7f2d4bf36800]
#0    Object "/lib/x86_64-linux-gnu/libc-2.27.so", at 0x7f2d4bf34e97, in raise
      Source "../sysdeps/unix/sysv/linux/raise.c", line 51, in raise [0x7f2d4bf34e97]
ERROR: Signal 6 occurred. VG has crashed. Visit https://github.com/vgteam/vg/issues/new/choose to report a bug.
Please include this entire error log in your bug report!
━━━━━━━━━━━━━━━━━━━━

But then the program captures this error, does the pruning step and constructs the index again.

warning:[IndexRegistry] Child process 37623 failed with status 34304 representing exit code 134
[IndexRegistry]: Exceeded disk use limit while performing k-mer doubling steps. Rewinding to pruning step with more aggressive pruning to simplify the graph.
[IndexRegistry]: Pruning complex regions of VG to prepare for GCSA indexing with GBWT unfolding.
[IndexRegistry]: Constructing GCSA/LCP indexes.

It runs slow, and haven't finish with three days.

Did you also meet this situation? Or maybe you set some special hyper-parameters for vg autoindex to overcome this error? (one hyper-parameter that might be related is -M --target-mem, I use the default setting:1/2 of available, which is around 500G in my case, I am not sure if I need to increase it)

Thanks!