Closed davidsebfischer closed 3 years ago
Which solution do you envision? Should we just generate a unique hash? MD5 for all files and then generate a hash from that?
Alternatively, we could use the names to go for a consistent, unique scheme?
I'd prefer a unique scheme based on names, I would consider MD5 as an orthogonal mechanism that could be applied to all data sets. Essentially we can use the same scheme as before and just need to replace doi with a constant string for example.
@Zethson let's also decide this now? how about we structure this by source, e.g. we could create "no_doi_10x_genomics" as a DOI equivalent, meaning that their websites identifies these data sets? If we find more such sources, we could add them similarly. This would also make sense thinking about how these data files are then deposited on disk, this way they will all lie together.
Yeah, that sounds reasonable. I like that it will result in a nice grouping.
This applies to all data sets currently in
d_nan
.