mudiboevans / Group_1-Reproducible-CCBG-pipeline

0 stars 8 forks source link

Challenges faced in reproducing the analysis. #2

Open mudiboevans opened 4 years ago

mudiboevans commented 4 years ago

Kindly highlight the challenges faced while reproducing workflow, objective one

TumuhimbisePeninah commented 4 years ago

The biggest challenge was the resources, even the HPC resources available were limited and could not reproduce the complete analysis.

Installation of Albacore failed completely.

TumuhimbisePeninah commented 4 years ago

It took us a while to get access to the data from the authors and when we did the data was too large to work with.

TumuhimbisePeninah commented 4 years ago

The complexity of the analysis was/is also a challenge as it took us some time to fully understand the various steps and processes involved.

TumuhimbisePeninah commented 4 years ago

We also had a lot of issues with permissions regarding some software and processes, and some of the resources still required permissions frequently which was exhausting.

TumuhimbisePeninah commented 4 years ago

Online collaboration was a big challenge because of resources, distance, timing among others. Considering we had never met in person, it was a great opportunity but also a challenge

TumuhimbisePeninah commented 4 years ago

Some software completely refused to work on some platforms of the team members.

mudiboevans commented 4 years ago

Building the docker image failed in some cases, softwares were not installed in the docker container as expected.

Some opted for manual installation of softwares and their dependencies. Installation of some softwares was unsuccessful e.g. nanopolish and canu.

Docker file was not well organized, software like pandas required a higher version of pip to installed.

baker371 commented 4 years ago

The installation document are not explanatory enough to run the analysis without looking at other secondary information sources.

PMuchina commented 4 years ago

Working with docker in some cases e.g on the ICIPE server where one team was working on, required permission which turned out to be a headache. The person running the account finally had to be on campus for docker to run. This was inconvenient to the person.

mudiboevans commented 4 years ago

The link provided in the paper for additional data availability does not include raw fast5 data.

baker371 commented 4 years ago

I was not able to download all the data for analysis (12 barcodes) due to storage space on both the server and personal pc , I managed to download only 1 barcode which was over 5GB of data.

PMuchina commented 4 years ago

One particular step as per their guideline i.e. filter_reads turned to be a nightmare. Despite its successful installation, it still gave a syntax error a situation which could not be resolved despite our numerous attempt. We finally had to skip that step for us to proceed.

mudiboevans commented 4 years ago

Miniasm had an empty output file upon incorporating the entire data set.

mudiboevans commented 4 years ago

The base calling and assembly step are computationally intensive as well as time consuming. i.e. the base calling on the entire data set took 2 days on ICIPE server.

Katunge commented 4 years ago

Despite successful installation of the docker file, it was only possible to run it onsite due to permission rights.

The data set for the project was huge, even after the authors shared the data, due to the size of the data files, downloading was a challenge for most of the team