Closed yarikoptic closed 1 month ago
@yarikoptic Your dandi pytest fixture writes NWB files without caching the spec. The export call caches the spec by default. I believe that explains all of the diff. If you want to export without caching the spec, you currently cannot do that using pynwb but we are going to remedy that in a quick bugfix to pynwb.
coolio, thanks @rly for quick response! And confirming on above example that we would get the same size and only id changed as requested
❯ /tmp/simple2.py /tmp/simple2.nwb /tmp/simple2-copy.nwb && ls -l /tmp/simple2.nwb /tmp/simple2-copy.nwb
Copying /tmp/simple2.nwb /tmp/simple2-copy.nwb using pywnb 2.5.0.post0.dev15
Now reading /tmp/simple2-copy.nwb
/tmp/simple2.py /tmp/simple2.nwb /tmp/simple2-copy.nwb 3.32s user 2.43s system 229% cpu 2.510 total
-rw-rw-r-- 1 yoh yoh 19664 Sep 6 14:48 /tmp/simple2-copy.nwb
-rw-rw-r-- 1 yoh yoh 19664 Sep 5 15:18 /tmp/simple2.nwb
❯ diff -Naur <(h5dump /tmp/simple2.nwb) <(h5dump /tmp/simple2-copy.nwb)
--- /proc/self/fd/18 2024-09-06 14:48:31.938598041 -0400
+++ /proc/self/fd/19 2024-09-06 14:48:31.938598041 -0400
@@ -1,4 +1,4 @@
-HDF5 "/tmp/simple2.nwb" {
+HDF5 "/tmp/simple2-copy.nwb" {
GROUP "/" {
ATTRIBUTE "namespace" {
DATATYPE H5T_STRING {
@@ -45,7 +45,7 @@
}
DATASPACE SCALAR
DATA {
- (0): "154bbc4f-4276-47db-bac9-f7cdc8880aa4"
+ (0): "c8b730fc-f3bf-4619-8069-c66f5ff0a9aa"
}
}
GROUP "acquisition" {
@@ -183,7 +183,7 @@
}
DATASPACE SCALAR
DATA {
- (0): "db410d65-a49a-4bd8-8ec9-ad6076d272e7"
+ (0): "eb09c10a-6ac9-461b-bb44-5bccd2551a3b"
}
}
DATASET "date_of_birth" {
now I wonder -- how to discover if original file had spec cached or not so I export without only if prior one didn't have it cached?
The spec is cached in the hdf5 nwb file if the root hdf5 file contains an attribute named ".specloc"
(the value of which is set to "/specifications"
to indicate that the cached spec is in the specifications
group)
Alternatively, you can run pynwb.NWBHDF5IO.get_namespaces(path)
which returns an empty dict if there are no cached namespaces.
I believe this issue has been resolved. @yarikoptic please reopen if not.
Follow up to
1186
as initially observed while troubleshooting it for
If we use the same script as provided in #1186 with not broken hdmf 3.14.3, we get
so you can see that "copied" file is 189k while original just 19k. Is that expected/desired/unavoidable?
output of
diff -Naur <(h5dump /tmp/simple2.nwb) <(h5dump /tmp/simple2-copy.nwb)
: http://www.oneukrainian.com/tmp/simple2-h5dump.diffOriginal file is produced using this pytest fixture https://github.com/dandi/dandi-cli/blob/HEAD/dandi/tests/fixtures.py#L101
PS feel welcome to reassign to pynwb is the issue is there .