Suggestions from Eran and will incorporate in the near future:
It might be good to name allc files as XXX.allc instead of allc_XXX.txt. This would make it clear that allc is a standardized file format, like .fastq, .bed, .sam, etc.
I agree, will adopt this.
It might be good to define a compressed allc file format (e.g. .allcz?) or just .bgz?
All the ALLC generated by YAP is compressed by bgzip and also indexed by tabix. I would prefer allc.gz because .bgz will likely break many path wildcards that already exist in mine and other’s code, which causes additional bugs/issues.
Does the .mcds file have a header or some way of storing metadata, e.g. which reference was used for quantification, etc?
Yes, MCDS is HDF5 based, technically it can store any metadata. I will take notes on that and add more metadata in the “allcools generate-mcds” function.
Suggestions from Eran and will incorporate in the near future:
It might be good to name allc files as XXX.allc instead of allc_XXX.txt. This would make it clear that allc is a standardized file format, like .fastq, .bed, .sam, etc. I agree, will adopt this.
It might be good to define a compressed allc file format (e.g. .allcz?) or just .bgz? All the ALLC generated by YAP is compressed by bgzip and also indexed by tabix. I would prefer allc.gz because .bgz will likely break many path wildcards that already exist in mine and other’s code, which causes additional bugs/issues.
Does the .mcds file have a header or some way of storing metadata, e.g. which reference was used for quantification, etc? Yes, MCDS is HDF5 based, technically it can store any metadata. I will take notes on that and add more metadata in the “allcools generate-mcds” function.