Closed thorstenwagner closed 5 months ago
Hi @thorstenwagner. I need more information before I can reproduce this.
/a_dataset.yaml # this is my root directory
Do you mean this is your system's root directory (the absolute path /
), or your project's root directory (where the .dud
directory lives)?
Now a.txt was updated.
Was it ever committed? How was it updated? Was it committed again after the update?
dud fetch
Is dud fetch
necessary to reproduce this issue? If so, what's your remote config? Where's dud push
in this scenario?
When I assume the simplest scenario, I can't reproduce this:
dud init
mkdir A
echo foo > A/a.txt
dud stage gen -o A/a.txt | tee a_dataset.yaml
dud stage add a_dataset.yaml
dud commit --copy
echo bar >> A/a.txt
dud commit
tree
rm A/a.txt
dud checkout
tree
cd A
rm a.txt
dud checkout
tree
Output:
Dud project initialized.
See .dud/config.yaml and .dud/rclone.conf to customize the project.
working-dir: .
outputs:
A/a.txt: {}
Added a_dataset.yaml to the index.
committing stage a_dataset.yaml
A/a.txt 4 B / 4 B 100% ?/s 1ms total
committing stage a_dataset.yaml
A/a.txt 8 B / 8 B 100% ?/s 1ms total
.
├── A
│ └── a.txt -> ../.dud/cache/ab/b4ca7eb554f159c4970bf8c7c723b724ff9e88cfeb5ee5eec6894f67bcd86b
├── a_dataset.yaml
└── run.sh
2 directories, 3 files
checking out stage a_dataset.yaml
A/a.txt 1 / 1 100% ?/s 0s total
.
├── A
│ └── a.txt -> ../.dud/cache/ab/b4ca7eb554f159c4970bf8c7c723b724ff9e88cfeb5ee5eec6894f67bcd86b
├── a_dataset.yaml
└── run.sh
2 directories, 3 files
checking out stage a_dataset.yaml
A/a.txt 1 / 1 100% ?/s 0s total
.
└── a.txt -> ../.dud/cache/ab/b4ca7eb554f159c4970bf8c7c723b724ff9e88cfeb5ee5eec6894f67bcd86b
1 directory, 1 file
It would be most helpful if you could provide a Bash script, like this one, which completely reproduces this issue.
/ is my project directory, not my system root :-) Lets see if I can make it somehow reproducible.
In principle the file A.txt were updated successfully on a different computer. Therefore I'm only tried to fetch the updated data on the different computer
To give you an example with the actual data:
I'm interested in the file gt.txt
:
tomotwin_evaluation_dataset is my project directory. Now I delete the file gt.txt
and then dud fetch; dud checkout
:
Looks good. Now I navigate to the dataset directory, delete gt.txt and run fetch+checkout:
As you can see, the symlink is broken now.
Kudo 2 @mstabrin , he made a reproducible example:
mkdir -p dudtest/mydud
ln -rs dudtest/mydud mydud
cd mydud
dud init
mkdir A
echo foo > A/a.txt
dud stage gen -o A/a.txt | tee a_dataset.yaml
dud stage add a_dataset.yaml
dud commit --copy
echo bar >> A/a.txt
dud commit
tree
rm A/a.txt
dud checkout
tree
cd A
rm a.txt
dud checkout
tree
@mstabrin is wondering: Why is the symlink relative to root? ^^
Thanks for the example that I can run and reproduce!
Having a project directory be a symlink is an unexpected use case. It's certainly not something I was planning to support, simply because I hadn't thought of it. I will poke at this a bit further, and if a fix is simple I will add it. But I can't guarantee support for a symlinked project directory. Can you help me understand the motivation behind this pattern?
My hunch is that this is the issue. From the Go docs (emphasis my own):
Getwd returns a rooted path name corresponding to the current directory. If the current directory can be reached via multiple paths (due to symbolic links), Getwd may return any one of them.
Consequently, I'm not sure how easy this will be to fix. I'll keep looking at it, though.
Would be really nice to see that fixed. I just happend again to me. I keep my training data separated from the actual training runs, instead I symlink the data. Now I was cd'ing into the symlinked data folder and add some new data. Now its broken again :-(
btw, havent seen a commit for while. Is dud still maintained? If not, I need to find another alternative although I really love dud. Its just does what I need ^^
Hi @thorstenwagner! As I mentioned above, using symlinked root folders is not recommended in Dud due to a limitation in Go's os.Getwd
.
For now I will close this issue, but please use this thread to explain your project setup in more detail. If you do, I should be able to recommend an alternative configuration that mitigates this issue.
Regarding Dud being maintained, I have definitely not abandoned Dud; I still use it myself. But "maintained" means different things to different projects and people. What does maintained mean to you? Right now I am very busy with both work and life, and I haven't been able to commit to Dud (literally 😃) as much as I'd like. This being an open-source project, I am always happy to review and merge pull requests 😃.
My hunch is that this is the issue. From the Go docs (emphasis my own):
Getwd returns a rooted path name corresponding to the current directory. If the current directory can be reached via multiple paths (due to symbolic links), Getwd may return any one of them.
Consequently, I'm not sure how easy this will be to fix. I'll keep looking at it, though.
One idea for a workaround on linux systems: You could use the output of getcwd and then use os.exec to call realpath OUTCWD
?
Assume the following:
/a_dataset.yaml # this is my root directory /A/a.txt
Now a.txt was updated.
If I run the following commands from root /, everything works fine:
If I run the following commands from /A/, the symlink to
a.txt
is broken:Best, Thorsten