Open mudiboevans opened 4 years ago
The biggest challenge was the resources, even the HPC resources available were limited and could not reproduce the complete analysis.
Installation of Albacore failed completely.
It took us a while to get access to the data from the authors and when we did the data was too large to work with.
The complexity of the analysis was/is also a challenge as it took us some time to fully understand the various steps and processes involved.
We also had a lot of issues with permissions regarding some software and processes, and some of the resources still required permissions frequently which was exhausting.
Online collaboration was a big challenge because of resources, distance, timing among others. Considering we had never met in person, it was a great opportunity but also a challenge
Some software completely refused to work on some platforms of the team members.
Building the docker image failed in some cases, softwares were not installed in the docker container as expected.
Some opted for manual installation of softwares and their dependencies. Installation of some softwares was unsuccessful e.g. nanopolish and canu.
Docker file was not well organized, software like pandas required a higher version of pip to installed.
The installation document are not explanatory enough to run the analysis without looking at other secondary information sources.
Working with docker in some cases e.g on the ICIPE server where one team was working on, required permission which turned out to be a headache. The person running the account finally had to be on campus for docker to run. This was inconvenient to the person.
The link provided in the paper for additional data availability does not include raw fast5 data.
I was not able to download all the data for analysis (12 barcodes) due to storage space on both the server and personal pc , I managed to download only 1 barcode which was over 5GB of data.
One particular step as per their guideline i.e. filter_reads turned to be a nightmare. Despite its successful installation, it still gave a syntax error a situation which could not be resolved despite our numerous attempt. We finally had to skip that step for us to proceed.
Miniasm had an empty output file upon incorporating the entire data set.
The base calling and assembly step are computationally intensive as well as time consuming. i.e. the base calling on the entire data set took 2 days on ICIPE server.
Despite successful installation of the docker file, it was only possible to run it onsite due to permission rights.
The data set for the project was huge, even after the authors shared the data, due to the size of the data files, downloading was a challenge for most of the team
Kindly highlight the challenges faced while reproducing workflow, objective one