Reduce test data size or provide externally

novoalab / EpiNano

Detection of RNA modifications from Oxford Nanopore direct RNA sequencing reads (Liu*, Begik* et al., Nature Comm 2019)

GNU General Public License v2.0

108 stars 31 forks source link

Reduce test data size or provide externally #70

Closed a-slide closed 3 years ago

a-slide commented 3 years ago

Hi @enovoa and @Huanle , Cloning the repository takes a really long time mainly because of the sheer size of the test dataset (5.3GB). As git is not an efficient way to share such large files, this makes deployment extremely slow. Would you be able to either considerably reduce the size of the test dataset or if this large dataset is really necessary provide it via an external provider (FTP, amazon...) ? Thanks

a-slide commented 3 years ago

Alternatively, distributing EpiNano via a package manager such as pypi would also solve the issue, and make the whole installation process much simpler

a-slide commented 3 years ago

Hi @Huanle, I could see that you remove the datasets which make the current version considerably smaller. However by default git clones the previous version in the history, which means it will still download the massive datasets. This can be mitigated by specifying "--depth 1" when cloning. It might be worth mentioning it in the doc otherwise people will still have to download several Gb of data each time Thanks