koszullab / GRAAL

(check out instaGRAAL for a faster, updated program!) This program is from Marie-Nelly et al., Nature Communications, 2014 (High-quality genome assembly using chromosomal contact data), also Marie-Nelly et al., 2013, PhD thesis (https://www.theses.fr/2013PA066714)
https://research.pasteur.fr/fr/software/graal-software-for-genome-assembly-from-chromosome-contact-frequencies/
14 stars 9 forks source link

Documentation is insufficient to get this software running. #1

Closed cerebis closed 8 years ago

cerebis commented 9 years ago

The documentation you have provided is incomplete.

The README refers to data files, which I assume are examples, but these are no where to be found either in Github or your papers supplementary data.

When I try to run this program using my own fasta sequences, I get the following error when hitting "build" after selecting a fasta file:

Traceback (most recent call last):
  File "main_window.py", line 353, in OnReturn
    self.main_window.pyramid = self.pyramid
AttributeError: 'LoaderWindow' object has no attribute 'pyramid'
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "main_window.py", line 89, in run
    pyramid = pyr.build_and_filter(self.base_folder, self.size_pyramid, self.factor)
  File "/Users/foobar/git/GRAAL/pyramid_sparse.py", line 36, in build_and_filter
    build(base_folder, init_size_pyramid, factor, min_bin_per_contig,)
  File "/Users/foobar/git/GRAAL/pyramid_sparse.py", line 168, in build
    shutil.copyfile(contig_info,current_contig_info)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shutil.py", line 82, in copyfile
    with open(src, 'rb') as fsrc:
IOError: [Errno 2] No such file or directory: u'/Users/foobar/git/GRAAL/workdir/info_contigs.txt'

Your code is trying to copy a file that doesn't yet exist. If this is some sort of guide file that I need to create beforehand, it's necessary to tell me what goes into it and where to put it. Perhaps this is a hanging legacy of development, where you've always had this dependency satisfied but were unaware that it existed.

Am I also only guessing what sequences are meant to be selected at this point. Is it the contigs? I would have a better idea if the next window was available, but GRAAL won't open until the pyramid is created.

So, please specify the file formats for:

Or explain how these can be satisfied, as I cannot get passed this step.

MacCampbell commented 9 years ago

Hi,

Yes, the documentation is incomplete.

Are you still trying to get it to work? I can provide some omitted scripts and these files:

info_contigs.txt
abs_fragments_contacts_weighted.txt

I'm in the process of trying to get it work myself. Perhaps we should fork it? Though, I'm a long way from getting a test data set off the ground.

meiers commented 9 years ago

Hi,

Same problem here. If I try to build a pyramid, the info_contigs.txt file is missing. Before reaching this point I had to fix several other issues which I am going to report separately. Not to mention that this was a though installation process. I clearly vote for the non-GUI version, too!

rkoszul commented 8 years ago

Hello,

The documentation provided in the README is targeted at the specific datasets provided as examples (only the corresponding raw reads were actually included in the original publication, the pyramid files had to be generated - our bad.). Here are the links to these datasets in case you wish to use them as is: Trichoderma - https://www.dropbox.com/s/ira4j0wrz3sucl8/trichoderma_qm6a.tar.gz Cerevisiae - https://www.dropbox.com/s/nx7tzhbzrp1l11t/cerevisiae_malaisyan_strain.tar.gz And the genomes - https://www.dropbox.com/s/7nbc3rotu3g8pt4/genomes.tar.gz

But, if you want, and of course you want to use your own datasets you may generate them using HiC-Box (https://github.com/koszullab/HiC-Box) which will perform the alignment and write the output in the appropriate format. You only need to perform the steps up to the "ready for computation" part.

The bulk of the data is contained in a file called abs_fragments_contacts_weighted.txt which is a very simple sparse array text file separated in three columns: id_frag_a, id_frag_b and n_contacts.

Please bear in mind that setting up an appropriate environment for deploying GRAAL, while not difficult, can be quite complicated depending on your configuration; if you encounter any further problems open an issue and we'll provide assistance.