hubmapconsortium / portal-containers

Docker containers to pre-process data for visualization in the portal
MIT License
0 stars 1 forks source link

Is ".h5ad" precise enough? #13

Open mccalluc opened 4 years ago

mccalluc commented 4 years ago

@mruffalo : Looking back at this again, is it sufficiently precise to just look for the .h5ad extension, or is the same file format likely to be used for other kinds of data? If it's not sufficiently precise, could you suggest a longer extension (.something.h5ad) that you could produce, and we would look for, and then assign back to me?

mruffalo commented 4 years ago

As far as I know, .h5ad is only used for "HDF5 following the AnnData convention", so that should be precise enough.

mccalluc commented 4 years ago

@mruffalo : Sorry I didn't phrase that more clearly. I'm not worried about an ".h5ad" not being the right kind of HDF5, but it's a general format, and could conceivably be used to store something other than umap. For now, our pipeline starts when files with a recognized extension is seen, so I think we'd want to distinguish between ".umap.h5ad" and ".something-else.h5ad".

(I'm not sure exactly where this logic is right now: I believe that Joel has done work in ingest-api that references my cwl in airflow-dev.)