manulera / ShareYourCloning_frontend

The frontend application for ShareYourCloning
MIT License
1 stars 4 forks source link

New feature: import oligos from spreadsheet or database #149

Open dgruano opened 2 months ago

dgruano commented 2 months ago

The basic idea: I think it might be useful to allow users to upload a spreadsheet with oligos they have already designed. Cloning strategies can have multiple steps, with many oligos being used, and entering by hand can be cumbersome and lead to mistakes.

The crazy idea: Labs that have large collections of oligos may find it useful to import all of them and design their strategies within ShareYourCloning. This is a bigger change that may not align with your idea of use of SYC, or that should be implemented later within Genestorian. I imagine importing all oligos directly to the "Primers" page is not ideal, as they will be saved as items in the data model. The alternative I can think of would be to add a new tab section "lab collection" where users import their whole database which SYC saves in the users' browser. Then, in the "Primers" page, we could add a button like "Add from collection", which looks for an ID and imports it.

The even crazier idea: Some repositories (e.g. AddGene) or companies (e.g. IDT, Eurofins) have lists of commonly used primers. We could save a copy of these and add them as a "Standard primers" collection (like explained above). Otherwise, we could parse them upon request (I haven't found an API, so we would have to scrape the HTML, which is not ideal).

What do you think @manulera ? If you think some of the above is useful and we come up with the best implementation, I can work on it!

manulera commented 2 months ago

Hi @dgruano I have been thinking about something like this for a while as well. I have considered adding a feature of "import from csv" or "import from spreadsheet". I thought within the primers tab you can have two sub-tabs "session primers" (what is there now) and "collection primers" (whatever you updated). For the collection, you would not want the full list displayed, so some pagination would be needed. I think it should not be too hard to do with MUI. I agree it's necessary to be able to access from a collection in one way or another.

It would be great to allow users to have a way to access a permanent collection as you propose. The only issue with the browser storage is that it will not be in-sync with the actual primer list, so they will have to update it every now and then (admittedly not too often, but probably you want to be cloning with the primers you just bought). But that's already much better than uploading the primer list every time you want to clone as a csv or if you refresh the page, so probably it would be a good first step.

Long term, we could also think of integrating with google spreadsheets / oneDrive excel sheet. A master student has developed an integration to read and write sequences to google drive, but it's trickier than I thought. Hopefully there will be a PR soon. The way we did it, you cannot just authenticate to google and read your files. Essentially, all the accounts that would use the website to access drive files need to be added to a list in a google APIs interface, so it's just not very practical.

In summary, I am happy if you want to give a go to the browser storage, but knowing that it's likely a temporary solution until we figure out something that can be in-sync with what they use as primer storage.

The commonly used primers however is a great idea, and probably would stay there long term (they could be on a separate sub-tab "Standard primers" with name , sequence and maybe a reference to where they come from (url or similar). For this, I would make a separate repository that does the scraping and saves primers to a tsv file. The scripts can be run weekly/daily with a cron github action directly from github, I don't know if you have done it before, but it's easy, I can share an example. For scraping with python if you need to click on things, you can use playwright. Here is an example that I am using to download a full list of plasmids from an addgene kit (you have to click on a select element to display the full list, so simply scraping the initial document is not enough)

dgruano commented 2 months ago

Great to hear you have also thought about it! I guess we can wait for the PR with the Drive integration as it is much more useful to always have access to the last version of the collection. By the way, have you done a small survey to know what people use for strains, plasmids and oligo collections? Maybe some of these could be easily integrated.

I'll start with the standard primers then! One question though: do you think that primers used for sequencing, genotyping by PCR (and not for cloning) are useful within ShareYourCloning? I wonder if they can be added to the data model to not only share how you made a construct, but also how you check it! Happy to see what you think and try to integrate it in the way I implement the "Standard primers" feature (most primers I came across are used for sequencing and not cloning, hence the question).

Thanks for the resources on scraping, I've used html parsers and Selenium before. Once I have figured this out, it would be great to learn about the cron github action!

manulera commented 2 months ago

Hi @dgruano, yes the "verification" aspect is in the horizon, although I have not paid much attention to it.

It should be possible to link to sequencing data (with the primer used if Sanger), gel image + link to primers if PCR, etc.

We could have a separate tab (same level as sequence, primers, etc.) called verification, and an icon like the eye in the sequences in the tree to add a verification attachment. This could support different type of attachments, and be linked to primers.

manulera commented 2 months ago

By the way, have you done a small survey to know what people use for strains, plasmids and oligo collections? Maybe some of these could be easily integrated.

For this, I have not yet. I guess oneDrive / google sheets are probably the most popular, but it's based on my small experience. What do you use in your lab? If they use something else (Benchling, some other ELN), I am not sure it would be easy.

dgruano commented 2 months ago

What do you use in your lab? If they use something else (Benchling, some other ELN), I am not sure it would be easy.

It makes sense to me that some labs use online spreadsheets as they are simple and easily shareable. Both labs I've worked with use FileMaker databases. These are easily exported as spreadsheets though. Regarding Benchling, I have used it for my own work but not as a database, so I agree it might not be easy!

dgruano commented 1 month ago

Quick thought about a possible workaround to interact with Google Drive: maybe a Google Collab notebook can read the spreadsheet from the user's Drive and send a request to SYC ? Very vague idea...