Open le-ander opened 2 years ago
It's public data but they use google drive to store it. I used gdown https://github.com/wkentaro/gdown to download it programmatically, if I remember correctly
Thanks for the info, Laura! :) Technically, we could expand the download capabilities of sfaira to handle gdrive links though I'm not sure how common this is.
I just saw that the data in the h5ad they provide in the gdrive is not actually raw counts but log-normalised. It looks like they provide the raw counts here: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE158055 do you think you could update the dataloader to use the raw counts? In that case the automatic downoaad would be solved as well. Thanks a lot in advance!
Good point! I just checked again and we do have raw counts in adata.raw from the gdrive file but I can change it to the GEO files if that is more convenient :) However, we then lose the default embedding etc.
oooh, I did not check for adata.raw #retro @davidsebfischer what's your take? switch to GEO and loose some metadata or add gdrive download support?
I'd go for GEO int his case (as we have cell annotation in both), it's the more permanent store and we can live without embedding. but leave the grdive in a comment in accompanying the text file maybe.
Alright, could you do that, Laura? :)
Si!
homosapiens_None_2021_None_renxianwen_001_10.1016/j.cell.2021.01.053
cannot be downloaded with sfaira because the download link is a google drive link which does not support programmatic download.If this data was shared with the authors to be added to sfaira and be publicly available we should probably ask them to put it in a different place (eg. figshare) from where it can be programmatically downloaded. If the data is not meant to be shared withe the public, we should drop the download link from the dataloader.
@lauradmartens or @davidsebfischer could you add some insight here? :)