Open Sebatina opened 4 months ago
Hi Sebatina,
Thank you so much for your very detailed report. What you are describing is the expected behavior. All the data to reproduce the manuscript figures is provided in the GitHub repository, but a more minimal set of files is provided in the Docker repository due to the already-substantial time needed to pull the Docker images. When we had more data in the Docker image, it took more time to download the information from Docker onto one's local machine.
However, if you would like to add data from the GitHub repository into the Docker repository, you can do this. You can clone the GitHub repository onto your local machine with the command git clone https://github.com/jackievaleri/BioAutoMATED.git BioAutoMATED
. Then, you can upload files from that repository into the Jupyter interface as you normally would.
Please let me know if this helps address your question, and happy to provide additional support if needed.
Hi Jackievaleri,
Got it, thanks for the clarification!
I'll go ahead and clone the GitHub repository to access the additional files needed. Appreciate your guidance on this.
Great! I'm going to close this issue but please feel free to reach out if other questions come up.
Hi, I have a query regarding the feature extraction functionality in BioAutoMATED. My dataset comprises approximately 50,000 sequences stored in a single column. Unlike typical datasets, these sequences do not have any associated features.
Could you kindly advise on the appropriate approach to utilize BioAutoMATED for extracting features from these sequences as part of an AutoML pipeline? Any guidance or recommendations you could provide would be immensely helpful.
Thank you for reaching out about the feature extraction functionality in BioAutoMATED. Based on your description, BioAutoMATED may not be the ideal tool for your specific use case. BioAutoMATED is designed to map sequences to a single binary value, continuous value, or categorical value, which may not align with your need to extract features from 50,000 sequences with no associated features.
Our tool is optimized for mapping a single sequence to a single value, such as in cases where you have a specific sequence of interest (e.g., a protein sequence) and a corresponding value (e.g., immunogenicity of that protein). In this scenario, you would provide a CSV or Excel file with a column for sequences and a column for values, allowing BioAutoMATED to create a model based on the sequence-function relationship.
However, for datasets with multiple sequences and no associated features, other tools may be more appropriate for feature extraction and AutoML pipelines. We recommend exploring tools specifically designed for handling large datasets of sequences without associated features. In particular, you may want to explore iLearnPlus, which has a robust set of feature extraction options for nucleic acid and protein sequences: https://ilearnplus.erc.monash.edu
Hello, As a newcomer to Docker, I attempted to pull the Docker repository for BioAutoMATED. However, I encountered an issue where certain files were missing from the pulled repository. This issue presents a challenge for me as I am still learning to navigate Docker environments and rely on having all neccessary files available.
Steps to Reproduce:
I expected all necessary files and directories to be present in the pulled Docker repository. But Certain files or directories are missing, making it impossible to proceed with the installation and usage of BioAutoMATED.
Details of Missing Items:
These are some of the files that I mentioned as missing.
Environment
Additional Information
As a beginner in Docker, I may not be aware of potential troubleshooting steps or alternative solutions to address this issue. I have attempted to pull the Docker repository multiple times, but the issue persists. Additionally, I have checked the repository on GitHub to verify that the missing files are indeed absent from the source.
Request for Assistance
Given my limited experience with Docker, I kindly request assistance in resolving this issue and obtaining the missing files. Any guidance or suggestions tailored to a newcomer's perspective would be greatly appreciated.
Thank you for your understanding and support.