Closed mwang87 closed 3 years ago
Thank you Ming - this would be great! A better way to find the files on MassIVE would be an excellent change. I have a couple of questions:
Hey Will,
Yes, we keep these up to date every 24 hours and additionally every two weeks I download every single open format mass spec file and generate a summary for it (i.e. MS1, MS2 counts, and some metadata).
As for API, I'm fixing a bug in usability I noticed, but here is the web API endpoint that I personally use a lot:
I gotta work to document it better but its a pretty standard datasette API
Awesome - this sounds perfect then. I'll work on getting ppx switched over. Thanks!
Well one thing I've been doing is using it as a cache, so I think keep your current implementation and try using this as a first line of getting the data. If it errors out or no files are returned, then fallback on the ftp.
Hi Will,
Awesome work. I was just chatting with Wout and he let me know that you're using the FTP server for massive to get all the files from massive. If there are a lot of files it can be incredibly slow to traverse that. I ran into this exact problem for so many projects! So I created a dataset files cache (that also does other things to precompute). If you're interested, it might be a better way to get all the files for a massive dataset:
https://gnps-datasetcache.ucsd.edu/datasette/database/filename
There are web apis for all of it since behind the scene its a sqlite database with datasette on top of it.
Let me know what you think!
Best,
Ming