When merging data the merger matches corresponding files by their file path. A large number of files in the same folder (or parent node) seem to significantly slow down processing (not even certain the process would ever terminate).
My personal example, but simplified: I have data of speakers of two age groups (adolescents, adults) and two speaker types (monolingual vs. bilingual). And for each I have a file in two formats. Consider the following arrangement (I):
Arranging the data like this leads to successful merging. Not sure what the source of this is, but I assume pairing documents works more efficiently or does not lock up. Just a guess.
During the non-terminating scenario (I) all processor cores run under full load until pepper is stopped by keyboard interrupt. Progress updates are printed (but from what I can tell no progress is made, not entirely sure about that).
When merging data the merger matches corresponding files by their file path. A large number of files in the same folder (or parent node) seem to significantly slow down processing (not even certain the process would ever terminate).
My personal example, but simplified: I have data of speakers of two age groups (adolescents, adults) and two speaker types (monolingual vs. bilingual). And for each I have a file in two formats. Consider the following arrangement (I):
Trying to merge the imports of FORMAT_A_Importer and FORMAT_B_Importer does not terminate or is at least very very slow.
Another view on the data could be (II):
Arranging the data like this leads to successful merging. Not sure what the source of this is, but I assume pairing documents works more efficiently or does not lock up. Just a guess.
During the non-terminating scenario (I) all processor cores run under full load until pepper is stopped by keyboard interrupt. Progress updates are printed (but from what I can tell no progress is made, not entirely sure about that).