srvk / DiViMe

ACLEW Diarization Virtual Machine
Apache License 2.0
32 stars 9 forks source link

Clean the VM #64

Closed MarvinLvn closed 5 years ago

MarvinLvn commented 6 years ago

Big spring fall clean ! What needs to be done :

*notes regarding this intermediate wrapper (from a conversation between Eric and Alex):

  1. tool cannot deal with long files, so they need to be cut up into smaller clips, then each clip gets processed, then the individual rttm's get reconcatenated. Note that Eric already wrote something for this. Note also that this is excellent for models that take local decisions -- i.e. all except diarization like diartk, which should try to label each individual speaker over the whole file. Let us make this the default for now.
  2. tool has massive model, so it would be better to load this only once and then process all the files. It is difficult to imagine a tool that would benefit from being called for each file separately, so let us make this the default for now.

Don't hesitate if you have ideas to improve the way we're dealing with the files hierarchy. It's now or never !

riebling commented 6 years ago

lots of tools break when they can't find things where they expect. I'm not sure it's fair to require tool creators (some no longer available, or some tools we may not have source code for) to make changes to their code to accomodate new folder names.

riebling commented 6 years ago

Config files normally found in /vagrant have gone missing in the reorganization. More than one tool uses the HTK feature config file (ERROR) [1] in configManager : cFileConfigReader::openInput : cannot open input file '/vagrant/MED_2s_100ms_htk.conf'! Also .theanorc (which is in the reorganized repository) and also a whole folder of configuration files named https://github.com/srvk/DiViMe/tree/master/conf which exists "outside the VM" for a reason: it makes it easier for nonexperts to change a configuration local to their machine, but then tools in the VM read the new configurations to work in a different way. conf is currently mostly for enhancements to the VM (the slurm batch queuing system for example) but I see @MarvinLavechin is using a new subfolder conf/vad. I propose that we preserve conf in the reorganization, and if we want to move forward with changing tool source code to work in new folder layouts, we alter the source of the 2 or more tools (much as this terrifies me) that depend on MED_2s_100ms_htk.conf to get their configs from /vagrant/conf/ instead of just /vagrant, so that at least that ugly MED_2s_100ms_htk.conf filename won't be there making the list of files look ugly :)

alecristia commented 6 years ago

lots of tools break when they can't find things where they expect. I'm not sure it's fair to require tool creators (some no longer available, or some tools we may not have source code for) to make changes to their code to accomodate new folder names.

For the past tools, I propose we either fix it ourselves or remove the tools. We are not committed to preserving all tools that are there. The goal is not to preserve the VM as it is now, but rather to build a sensible, well-organized platform that others can build on and use easily. Perhaps the changes we suggested with Marvin were not ideal for this, so we are very open to discussion of how best to do this.

alecristia commented 6 years ago

Config files normally found in /vagrant have gone missing in the reorganization. More than one tool uses the HTK feature config file (ERROR) [1] in configManager : cFileConfigReader::openInput : cannot open input file '/vagrant/MED_2s_100ms_htk.conf'! Also .theanorc (which is in the reorganized repository) and also a whole folder of configuration files named https://github.com/srvk/DiViMe/tree/master/conf which exists "outside the VM" for a reason: it makes it easier for nonexperts to change a configuration local to their machine, but then tools in the VM read the new configurations to work in a different way. conf is currently mostly for enhancements to the VM (the slurm batch queuing system for example) but I see @MarvinLavechin is using a new subfolder conf/vad. I propose that we preserve conf in the reorganization, and if we want to move forward with changing tool source code to work in new folder layouts, we alter the source of the 2 or more tools (much as this terrifies me) that depend on MED_2s_100ms_htk.conf to get their configs from /vagrant/conf/ instead of just /vagrant, so that at least that ugly MED_2s_100ms_htk.conf filename won't be there making the list of files look ugly :)

I think part of this comments point out a bug I introduced -- sorry about that.

And part is about adding back a folder I did not know about. Of course, we can use it, in which case it would be ideal if there was some documentation explaining what is in there and why, so that in the future people can re-use it.

riebling commented 6 years ago

Super! Moving on... (I've been working from home this morning since waking up a couple hours ago) I've now made enough changes to the new scripts in launcher/ (and possibly utils/) to get the tests to pass. This does not mean that they all work in the desired way yet, simply that they WORK. If it's OK to commit all my changes, you could use this as a starting point for "something that works" then make incremental changes to finish making them work with the new reorganized folder paths.

In particular, things that I changed in order to make the tests all pass, that may need more changes to work to fully work in the reorganized way:

Things that still need work:

So I'm mostly writing to ask if it's OK to commit my changes to test.sh that "work", and the launcher/ scripts I've modified to work with it, in hopes that development of the reorganized branch can continue with now a way to test tools again, and starting from these "working" ones, then make further refinements to them use new temp folder layout.

alecristia commented 6 years ago

yes, certainly! Please push at your convenience, I'll pull and take the relay.

macw commented 6 years ago

I agree. We want a well-organized system and there is no sense in having extraneous pieces. Sometimes people like to keep things in systems just in case they might be useful, but then those extraneous pieces end up causing organizational drag.

--Brian

On Nov 15, 2018, at 3:05 AM, Alex Cristia notifications@github.com wrote:

lots of tools break when they can't find things where they expect. I'm not sure it's fair to require tool creators (some no longer available, or some tools we may not have source code for) to make changes to their code to accomodate new folder names.

For the past tools, I propose we either fix it ourselves or remove the tools. We are not committed to preserving all tools that are there. The goal is not to preserve the VM as it is now, but rather to build a sensible, well-organized platform that others can build on and use easily. Perhaps the changes we suggested with Marvin were not ideal for this, so we are very open to discussion of how best to do this.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/srvk/DiViMe/issues/64#issuecomment-438952140, or mute the thread https://github.com/notifications/unsubscribe-auth/AFXzuaxzuULsY0YHq_5Kxuqlx2Ujy28Xks5uvSBbgaJpZM4X4RsU.

alecristia commented 6 years ago

Eric and myself discussed having a launcher and utils folders as separate repos versus within the divime folder. We decided for the latter because:

The only disadvantage is that this makes the divime repo heavier in size, but this does not seem to be an issue given that launchers and utils are all light files.

We also decide to add repos to PATH, which will allow very clean and easy to read code for calling tools: vagrant ssh -c launchMe.sh data

We also decide that launchers will be by default read-only, to avoid accidental editing

riebling commented 6 years ago

ok! not sure what all check boxes we can check off, but I just committed a Vagrantfile and updated some tools:

alecristia commented 6 years ago
riebling commented 6 years ago

one of the far above checkboxes goes away (compile selectively the docs based on tool failures)

alecristia commented 6 years ago

good point, that actually takes care of 2 problems - because now the online and local docs are the same. I'm adding a line to bootstraph.sh to build the local docs.

Also, I think the last unchecked box, "Inside each tool subfolder (e.g., tools/OpenSAT), there will be a wrapper that specifies the behavior to adopt regarding the files that need to be processed*" is no longer relevant, right? Since you're systematically having all the tools analyze file by file, so by the same token, this routine should break up long files into subfiles?