Open lefos99 opened 1 year ago
Hi @lefos99 , it looks like this might be a duplicate of #9085
Could you share the viztracer profile:
pip install viztracer
dvc exp run -v --temp --viztracer-depth=10
Could you share the viztracer profile:
pip install viztracer dvc exp run -v --temp --viztracer-depth=10
Hey @daavoo ,
Thanks for your reply. :bow:
Here is a screenshot from the viewer:
Please let me know if you would need anything else.
Hi @lefos99! Would you have time for a call to walk through your scenario so we can better understand your pain points and see how we can help either by improving performance or suggesting changes to your workflow?
For the record, I think this might be a duplicate / affected by #9085
Hi @lefos99! Would you have time for a call to walk through your scenario so we can better understand your pain points and see how we can help either by improving performance or suggesting changes to your workflow?
Hey @dberenbaum, I would be more than happy to have a call. Because of vacations, I will be available from next week on. Here are some meeting suggestions: https://calendar.app.google/8VMCwMBNEVBKohEWA.
Booked a time. Looking forward to it!
Bug Report
Description
I have the following DVC structure (output of
dvc dag
)and my
dvc.yaml
(simplified version) looks like this:The stage
generate_data
has a heavy ouptut${data_processing.paths.generated_data_dir}/patches/train
. It is pretty heavy as it contains a big number of patches (577.331
files). So to initiate an isolated experiment (by either queued experiments or temp experiments):Collecting files and computing hashes in data/generated_datasets/default/patches/train
takes a lot of time. (sometimes even 12 minutes) :turtle:Collecting files and computing hashes ...
is being executed multiple times, which I don't understand why. :thinking:More precisely by running in verbose mode (
-v
), I get the following waiting times:Collecting files and computing hashes in data/generated_datasets/default/patches/train
- 1st time: (takes ~1 min with14kfile/s
)Collecting files and computing hashes in data/generated_datasets/default/patches/train
- 2nd time: (takes 12 mins with700file/s
)Collecting files and computing hashes in data/generated_datasets/default/patches/train
- 3rd time: (takes 1 min with14kfile/s
)Collecting files and computing hashes in data/generated_datasets/default/patches/train
- 4th time: (takes ~1 min with14kfile/s
)Collecting files and computing hashes in data/generated_datasets/default/patches/train
- 5th time: (takes ~1 min with14kfile/s
)Collecting files and computing hashes in data/generated_datasets/default/patches/train
- 6th time: (takes ~1 min with14kfile/s
)Collecting files and computing hashes in data/generated_datasets/default/patches/train
- 7th time: (takes ~1 min with14kfile/s
)train
stage is invoked :heavy_check_mark:train
is done, once againCollecting files and computing hashes in data/generated_datasets/default/patches/train
(takes ~1 min with14kfile/s
)Collecting files and computing hashes in data/generated_datasets/default/patches/train
(takes ~1 min with14kfile/s
)Collecting files and computing hashes in data/generated_datasets/default/patches/train
(takes ~1 min with14kfile/s
)Expected
I would expect for the operation
Collecting files and computing hashes in data/generated_datasets/default/patches/train
not to be invoked so many times!Environment information
Output of
dvc doctor
: