issues
search
mlcommons
/
croissant
Croissant is a high-level format for machine learning datasets that brings together four rich layers.
https://mlcommons.org/croissant
Apache License 2.0
452
stars
41
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Include a prefix to the beam pipeline's stages
#767
ccl-core
opened
4 hours ago
1
Remove label from the transform to allow for multiple reuse.
#766
marcenacp
opened
3 days ago
1
Add extra safeguards for basic auth
#765
goeffthomas
opened
6 days ago
0
Fix bug with jsonpath_rw and numpy arrays
#764
ccl-core
closed
1 week ago
1
Fix bug with repeated fields.
#763
ccl-core
closed
2 weeks ago
1
Croissant vocabulary for crawled datasets
#762
wumpus
opened
3 weeks ago
2
Define and set a unique user-agent
#761
goeffthomas
opened
3 weeks ago
1
Figure update dataverse
#760
luisoala
closed
3 weeks ago
1
Cannot load Kaggle datasets
#759
marcenacp
closed
3 weeks ago
1
add Dataverse to list of integrations
#758
pdurbin
closed
3 weeks ago
4
Make field more robust with None/nan repeated input
#757
ccl-core
closed
1 month ago
1
Release 1.0.10
#756
ccl-core
closed
1 month ago
1
BoundigBox feature defaults to crs 1.0
#755
ccl-core
closed
1 month ago
1
croissant cloud
#754
stubbi
opened
1 month ago
1
Remove editor tests
#753
ccl-core
closed
1 month ago
1
Add the web-of-science dataset (from parquet)
#752
ccl-core
closed
1 month ago
1
Fix references definition in Croissant spec
#751
benjelloun
opened
1 month ago
0
Deprecate Nested RecordSets in favor of repeated subField
#750
benjelloun
opened
1 month ago
0
Update README.md
#749
ccl-core
closed
1 month ago
1
New release mlcroissant==1.0.9
#748
ccl-core
closed
1 month ago
1
Check that the mapping is valid after setting it.
#747
marcenacp
closed
2 months ago
1
Uniform jsonQuery and jsonPath
#746
ccl-core
opened
2 months ago
0
Example of a dataset with nested fields.
#745
ccl-core
closed
1 month ago
1
Use ids to reference a field or a node.
#744
ccl-core
closed
2 months ago
1
Allow datasets with joins when generating with Apache Beam.
#743
marcenacp
closed
2 months ago
1
Fix discrepancies with the specs
#742
ccl-core
closed
2 months ago
1
Cache the result of each operation.
#741
marcenacp
closed
2 months ago
1
Keys in a RecordSet should be a list of ids references.
#740
ccl-core
closed
2 months ago
1
Semantic annotations / triplification
#739
benjelloun
opened
2 months ago
0
Lineage / provenance representation
#738
benjelloun
opened
2 months ago
2
Data-level annotations
#737
benjelloun
opened
2 months ago
2
Isolate a `.call()` method in operations.
#736
marcenacp
closed
2 months ago
1
New release mlcroissant==1.0.8.
#735
marcenacp
closed
2 months ago
1
Remove `pipeline` argument from ReadFromCroissant and use `beam.ptransform_fn`.
#734
marcenacp
closed
2 months ago
1
[Apache Beam] Handle branches for operations in Beam
#733
marcenacp
opened
2 months ago
0
[Apache Beam] Compute shard_sizes explicitly instead of relying on max_shard_size
#732
marcenacp
opened
2 months ago
0
More features around Beam.
#731
marcenacp
closed
2 months ago
1
Allow to parallelize operations in mlcroissant with Apache Beam.
#730
marcenacp
closed
2 months ago
1
Make nodes and operations pickable.
#729
marcenacp
closed
2 months ago
1
WIP - Performance investigation
#728
marcenacp
closed
2 months ago
2
Adding the levanti dataset.
#727
ccl-core
closed
2 months ago
1
Add splits to the huggingface-mnist dataset
#726
ccl-core
closed
2 months ago
1
Invalid object type for field "distribution"
#725
pdurbin
opened
2 months ago
3
Can the Huggingface croissant API endpoint read croissant.json metadata created by this tool?
#724
cboettig
opened
3 months ago
0
Documentation for the python tool, mlcroissant?
#723
cboettig
opened
3 months ago
0
Release 1.0.7
#721
ccl-core
closed
4 months ago
1
Move filters from Dataset init to `self.records`
#720
ccl-core
closed
4 months ago
1
Apply filters to a Hugging Face dataset to avoid repeating all variants.
#719
marcenacp
closed
4 months ago
1
Add more info links on how to do releases.
#718
ccl-core
closed
4 months ago
1
Fix broken Unit tests.
#717
ccl-core
closed
4 months ago
1
Next