psychoinformatics-de / datalad-hirni

DataLad extension for (semi-)automated, reproducible processing of (medical/neuro)imaging data
http://datalad.org
Other
5 stars 8 forks source link

Using `--to-git` breaks `git diff` functionality #195

Open manuelakuhn opened 3 years ago

manuelakuhn commented 3 years ago

I try to add a file to git instead of git-annex using the --to-git flag. This works fine for the initial commit but as soon as I modify the file, git diff does not work anymore because it claims that the file is stored in git-annex.

This only happens when creating the dataset using the hirni configuration template. When using a dataset created without a configuration template this problem does not exist and it works as expected.

Minimal example:

bpoldrack commented 3 years ago
  • open + edit file, e.g. add print statement in _rules function
  • safe change

How exactly did you call save after editing and with what datalad version? I suspect you didn't say --to-git again?

Generally it probably boils down to the .gitattributes file. The default set by the cfg_hirni procedure would put everything into annex except for explicitly specified exceptions. That default is chosen, so one doesn't accidentally put something in git directly that may contain sensitive information (that's hard to retract). As long as things are in annex, one does maintain access control on the storage side per file. So, you could fix this by editing .gitattributes to say that this file (or every .py file, or any text file, or anything under code/, etc.) should be kept in git.

Ultimately it's a datalad issue though (as in not hirni itself) w/o an incredibly obvious solution. The .gitattributes say it should go into annex, while it's already tracked in git directly. One could argue "Well, save should respect .gitattributes for new files only!". But this isn't really inline with git-annex itself. You could base the config on file size and annex would change how the file is tracked, when it reaches that size threshold. So - not instantly clear how datalad should deal with that situation. Again: This is assuming you didn't specify --to-git with the second call to save.

However, if there's nothing to discuss about the default config set by hirni from your point of view, a discussion of a potential change of behavior in save, need for docs, etc. should move over there.

manuelakuhn commented 3 years ago

There is no second datalad save call in this. I did not save the file to datalad at all after editing. With "safe changes" (sorry about the typo) I just mean save the change in the editor. And right after closing the editor git diff reports what is shown above.

manuelakuhn commented 3 years ago

Just as an additional remark: I first thought this was a datalad issue in general but I did not manage to reproduce it with another type of dataset. I tried: no configuration template and the yoda template. That is when I suspected it has something to do with the hirni template and that is also why I opened a issue here.

Ultimately it's a datalad issue though (as in not hirni itself) w/o an incredibly obvious solution. The .gitattributes say it should go into annex, while it's already tracked in git directly. One could argue "Well, save should respect .gitattributes for new files only!". But this isn't really inline with git-annex itself. You could base the config on file size and annex would change how the file is tracked, when it reaches that size threshold. So - not instantly clear how datalad should deal with that situation. Again: This is assuming you didn't specify --to-git with the second call to save.

This behaviour I encountered as well and you make a good point here. While this seems to not be so straight forward, it would quite improve usability.

However, if there's nothing to discuss about the default config set by hirni from your point of view, a discussion of a potential change of behavior in save, need for docs, etc. should move over there.

This you can judge best, so feel free to do so.

bpoldrack commented 3 years ago

I just mean save the change in the editor. And right after closing the editor git diff reports what is shown above.

Oof. I have a suspicion (where it's actually git-diff itself that triggers the change), but will have a closer look and try to reproduce. Can you give me the output of datalad wtf please? Versions of datalad, git and git-annex may play a role here.

manuelakuhn commented 3 years ago

I always try to reproduce the problem in a clean virtual environment with the newest datalad installation. That's why it is always in /tmp :-)

Here you go:

$ datalad wtf
# WTF
## configuration <SENSITIVE, report disabled by configuration>
## credentials 
  - keyring: 
    - active_backends: 
      - SecretService Keyring
      - PlaintextKeyring with no encyption v.1.0 at /home/nela/.local/share/python_keyring/keyring_pass.cfg
    - config_file: /home/nela/.config/python_keyring/keyringrc.cfg
    - data_root: /home/nela/.local/share/python_keyring
## datalad 
  - full_version: 0.14.1
  - version: 0.14.1
## dataset 
  - id: 24c8a5bf-1b95-4397-b9c4-47d40fd09a0a
  - metadata: <SENSITIVE, report disabled by configuration>
  - path: /tmp/datalad-tests/sourcedata
  - repo: AnnexRepo
## dependencies 
  - annexremote: 1.5.0
  - appdirs: 1.4.4
  - boto: 2.49.0
  - cmd:annex: 8.20201129
  - cmd:bundled-git: UNKNOWN
  - cmd:git: 2.26.3
  - cmd:system-git: 2.26.3
  - cmd:system-ssh: 8.3p1
  - exifread: 2.3.2
  - humanize: 3.3.0
  - iso8601: 0.1.14
  - keyring: 23.0.1
  - keyrings.alt: 4.0.2
  - msgpack: 1.0.2
  - mutagen: 1.45.1
  - requests: 2.25.1
  - wrapt: 1.12.1
## environment 
  - LANG: de_DE.UTF-8
  - PATH: /tmp/datalad-tests/virtualenv/bin:/home/nela/bin:/usr/share/Modules/bin:/home/nela/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/home/nela/bin
## extensions 
  - container: 
    - description: Containerized environments
    - entrypoints: 
      - datalad_container.containers_add.ContainersAdd: 
        - class: ContainersAdd
        - load_error: None
        - module: datalad_container.containers_add
        - names: 
          - containers-add
          - containers_add
      - datalad_container.containers_list.ContainersList: 
        - class: ContainersList
        - load_error: None
        - module: datalad_container.containers_list
        - names: 
          - containers-list
          - containers_list
      - datalad_container.containers_remove.ContainersRemove: 
        - class: ContainersRemove
        - load_error: None
        - module: datalad_container.containers_remove
        - names: 
          - containers-remove
          - containers_remove
      - datalad_container.containers_run.ContainersRun: 
        - class: ContainersRun
        - load_error: None
        - module: datalad_container.containers_run
        - names: 
          - containers-run
          - containers_run
    - load_error: None
    - module: datalad_container
    - version: 1.1.2
  - hirni: 
    - description: HIRNI workflows
    - entrypoints: 
      - datalad_hirni.commands.dicom2spec.Dicom2Spec: 
        - class: Dicom2Spec
        - load_error: None
        - module: datalad_hirni.commands.dicom2spec
        - names: 
          - hirni-dicom2spec
          - hirni_dicom2spec
      - datalad_hirni.commands.import_dicoms.ImportDicoms: 
        - class: ImportDicoms
        - load_error: None
        - module: datalad_hirni.commands.import_dicoms
        - names: 
          - hirni-import-dcm
          - hirni_import_dcm
      - datalad_hirni.commands.spec2bids.Spec2Bids: 
        - class: Spec2Bids
        - load_error: None
        - module: datalad_hirni.commands.spec2bids
        - names: 
          - hirni-spec2bids
          - hirni_spec2bids
      - datalad_hirni.commands.spec4anything.Spec4Anything: 
        - class: Spec4Anything
        - load_error: None
        - module: datalad_hirni.commands.spec4anything
        - names: 
          - hirni-spec4anything
          - hirni_spec4anything
    - load_error: None
    - module: datalad_hirni
    - version: 0.0.8
  - metalad: 
    - description: DataLad semantic metadata command suite
    - entrypoints: 
      - datalad_metalad.aggregate.Aggregate: 
        - class: Aggregate
        - load_error: None
        - module: datalad_metalad.aggregate
        - names: 
          - meta-aggregate
          - meta_aggregate
      - datalad_metalad.dump.Dump: 
        - class: Dump
        - load_error: None
        - module: datalad_metalad.dump
        - names: 
          - meta-dump
          - meta_dump
      - datalad_metalad.extract.Extract: 
        - class: Extract
        - load_error: None
        - module: datalad_metalad.extract
        - names: 
          - meta-extract
          - meta_extract
    - load_error: None
    - module: datalad_metalad
    - version: 0.2.1
  - neuroimaging: 
    - description: Neuroimaging tools
    - entrypoints: 
      - datalad_neuroimaging.bids2scidata.BIDS2Scidata: 
        - class: BIDS2Scidata
        - load_error: None
        - module: datalad_neuroimaging.bids2scidata
        - names: 
          - bids2scidata
    - load_error: None
    - module: datalad_neuroimaging
    - version: 0.3.1
  - webapp: 
    - description: Generic web app support
    - entrypoints: 
      - datalad_webapp.WebApp: 
        - class: WebApp
        - load_error: None
        - module: datalad_webapp
        - names: 
          - webapp
          - webapp
    - load_error: None
    - module: datalad_webapp
    - version: 0.3
## git-annex 
  - build flags: 
    - Assistant
    - Webapp
    - Pairing
    - Inotify
    - DBus
    - DesktopNotify
    - TorrentParser
    - MagicMime
    - Feeds
    - Testsuite
    - S3
    - WebDAV
  - dependency versions: 
    - aws-0.21.1
    - bloomfilter-2.0.1.0
    - cryptonite-0.25
    - DAV-1.3.4
    - feed-1.1.0.0
    - ghc-8.6.5
    - http-client-0.6.4
    - persistent-sqlite-2.9.3
    - torrent-10000.1.1
    - uuid-1.3.13
    - yesod-1.6.0.1
  - key/value backends: 
    - SHA256E
    - SHA256
    - SHA512E
    - SHA512
    - SHA224E
    - SHA224
    - SHA384E
    - SHA384
    - SHA3_256E
    - SHA3_256
    - SHA3_512E
    - SHA3_512
    - SHA3_224E
    - SHA3_224
    - SHA3_384E
    - SHA3_384
    - SKEIN256E
    - SKEIN256
    - SKEIN512E
    - SKEIN512
    - BLAKE2B256E
    - BLAKE2B256
    - BLAKE2B512E
    - BLAKE2B512
    - BLAKE2B160E
    - BLAKE2B160
    - BLAKE2B224E
    - BLAKE2B224
    - BLAKE2B384E
    - BLAKE2B384
    - BLAKE2BP512E
    - BLAKE2BP512
    - BLAKE2S256E
    - BLAKE2S256
    - BLAKE2S160E
    - BLAKE2S160
    - BLAKE2S224E
    - BLAKE2S224
    - BLAKE2SP256E
    - BLAKE2SP256
    - BLAKE2SP224E
    - BLAKE2SP224
    - SHA1E
    - SHA1
    - MD5E
    - MD5
    - WORM
    - URL
    - X*
  - local repository version: 8
  - operating system: linux x86_64
  - remote types: 
    - git
    - gcrypt
    - p2p
    - S3
    - bup
    - directory
    - rsync
    - web
    - bittorrent
    - webdav
    - adb
    - tahoe
    - glacier
    - ddar
    - git-lfs
    - httpalso
    - borg
    - hook
    - external
  - supported repository versions: 
    - 8
  - upgrade supported from repository versions: 
    - 0
    - 1
    - 2
    - 3
    - 4
    - 5
    - 6
    - 7
  - version: 8.20201129
## location 
  - path: /tmp/datalad-tests/sourcedata
  - type: dataset
## metadata_extractors 
  - annex (datalad 0.14.1): 
    - distribution: datalad 0.14.1
    - load_error: None
    - module: datalad.metadata.extractors.annex
    - version: None
  - audio (datalad 0.14.1): 
    - distribution: datalad 0.14.1
    - load_error: None
    - module: datalad.metadata.extractors.audio
    - version: None
  - bids (datalad-neuroimaging 0.3.1): 
    - distribution: datalad-neuroimaging 0.3.1
    - load_error: None
    - module: datalad_neuroimaging.extractors.bids
    - version: None
  - datacite (datalad 0.14.1): 
    - distribution: datalad 0.14.1
    - load_error: None
    - module: datalad.metadata.extractors.datacite
    - version: None
  - datalad_core (datalad 0.14.1): 
    - distribution: datalad 0.14.1
    - load_error: None
    - module: datalad.metadata.extractors.datalad_core
    - version: None
  - datalad_rfc822 (datalad 0.14.1): 
    - distribution: datalad 0.14.1
    - load_error: None
    - module: datalad.metadata.extractors.datalad_rfc822
    - version: None
  - dicom (datalad-neuroimaging 0.3.1): 
    - distribution: datalad-neuroimaging 0.3.1
    - load_error: None
    - module: datalad_neuroimaging.extractors.dicom
    - version: None
  - exif (datalad 0.14.1): 
    - distribution: datalad 0.14.1
    - load_error: None
    - module: datalad.metadata.extractors.exif
    - version: None
  - frictionless_datapackage (datalad 0.14.1): 
    - distribution: datalad 0.14.1
    - load_error: None
    - module: datalad.metadata.extractors.frictionless_datapackage
    - version: None
  - image (datalad 0.14.1): 
    - distribution: datalad 0.14.1
    - load_error: None
    - module: datalad.metadata.extractors.image
    - version: None
  - metalad_annex (datalad-metalad 0.2.1): 
    - distribution: datalad-metalad 0.2.1
    - load_error: None
    - module: datalad_metalad.extractors.annex
    - version: None
  - metalad_core (datalad-metalad 0.2.1): 
    - distribution: datalad-metalad 0.2.1
    - load_error: None
    - module: datalad_metalad.extractors.core
    - version: None
  - metalad_custom (datalad-metalad 0.2.1): 
    - distribution: datalad-metalad 0.2.1
    - load_error: None
    - module: datalad_metalad.extractors.custom
    - version: None
  - metalad_runprov (datalad-metalad 0.2.1): 
    - distribution: datalad-metalad 0.2.1
    - load_error: None
    - module: datalad_metalad.extractors.runprov
    - version: None
  - nidm (datalad-neuroimaging 0.3.1): 
    - distribution: datalad-neuroimaging 0.3.1
    - load_error: None
    - module: datalad_neuroimaging.extractors.nidm
    - version: None
  - nifti1 (datalad-neuroimaging 0.3.1): 
    - distribution: datalad-neuroimaging 0.3.1
    - load_error: None
    - module: datalad_neuroimaging.extractors.nifti1
    - version: None
  - xmp (datalad 0.14.1): 
    - distribution: datalad 0.14.1
    - load_error: None
    - module: datalad.metadata.extractors.xmp
    - version: None
## metadata_indexers 
## python 
  - implementation: CPython
  - version: 3.8.7
## system 
  - distribution: fedora/32
  - encoding: 
    - default: utf-8
    - filesystem: utf-8
    - locale.prefered: UTF-8
  - max_path_length: 285
  - name: Linux
  - release: 5.11.10-100.fc32.x86_64
  - type: posix
  - version: #1 SMP Thu Mar 25 14:26:30 UTC 2021