uniVocity-parsers is a suite of extremely fast and reliable parsers for Java. It provides a consistent interface for handling different file formats, and a solid framework for the development of new parsers.
917
stars
252
forks
source link
How to make InputValueSwitch context-aware when missing type information in rows #478
Thank you for this very nice library! I do have a (simple? stupid?) question about using your library in an edge case...
I have a TSV file like this (much shortened for readability):
As you can see, there are 3 different schemas in that (historically grown, likely unchangeable) file format for MetadataBlock, DatasetFieldTypes and ControlledVocabularyValue. The actual entries do not have a type identifier, but are seen as "in context" of the "sections" header line.
Now I would like to use your lib for data binding, but cannot figure out how to make InputValueSwitch context aware... I'd be glad for any help on this. Thank you! :pray:
(Do I need to implement an inherited ContextSwitch extends AbstractProcessorSwitch, as overriding switchRowProcessor() from AbstractInputValueSwitch isn't possible or is there a simpler solution?)
Here's the Java code I have so far:
/**
* This class will parse a given TSV file for Metadata Blocks, Dataset Fields and Controlled Vocabularies.
* You may fetch the different parts and use them in tests or to update the database of a real instance.
*/
public class TsvMetadataBlockParser {
MetadataBlock metadataBlock;
final List<DatasetFieldType> datasetFields = new ArrayList<>();
final List<ControlledVocabularyValue> controlledVocabularyValues = new ArrayList<>();
final TsvParser parser;
final BeanListProcessor<MetadataBlock> metadataBlockProcessor = new BeanListProcessor<>(MetadataBlock.class);
final BeanListProcessor<DatasetFieldType> datasetFieldProcessor = new BeanListProcessor<>(DatasetFieldType.class);
final BeanListProcessor<ControlledVocabularyValue> controlledVocabularyProcessor = new BeanListProcessor<>(ControlledVocabularyValue.class);
/**
* Create an input switch based on the first column.
* Will contain #metadataBlock, #datasetField or #controlledVocabulary for switching context
*/
final InputValueSwitch inputSwitch = new InputValueSwitch(0);
public TsvMetadataBlockParser() {
// Configure InputSwitch
this.inputSwitch.addSwitchForValue("#metadataBlock", metadataBlockProcessor);
this.inputSwitch.addSwitchForValue("#datasetField", datasetFieldProcessor);
this.inputSwitch.addSwitchForValue("#controlledVocabulary", controlledVocabularyProcessor);
this.inputSwitch.setDefaultSwitch(metadataBlockProcessor); // <- necessary as failing without, but also causing the headaches...
TsvParserSettings settings = new TsvParserSettings();
settings.setProcessor(inputSwitch);
settings.setHeaderExtractionEnabled(true);
// TODO: add error handler via settings.setProcessorErrorHandler()
this.parser = new TsvParser(settings);
}
public void readTsv(File tsvFile) {
// Do the parsing...
parser.parse(tsvFile, StandardCharsets.UTF_8);
System.out.println(datasetFieldProcessor.getHeaders()); // -> null
System.out.println(datasetFieldProcessor.getBeans()); // -> empty list
}
}
Hi @jbax
Thank you for this very nice library! I do have a (simple? stupid?) question about using your library in an edge case...
I have a TSV file like this (much shortened for readability):
As you can see, there are 3 different schemas in that (historically grown, likely unchangeable) file format for
MetadataBlock
,DatasetFieldTypes
andControlledVocabularyValue
. The actual entries do not have a type identifier, but are seen as "in context" of the "sections" header line.Now I would like to use your lib for data binding, but cannot figure out how to make
InputValueSwitch
context aware... I'd be glad for any help on this. Thank you! :pray:(Do I need to implement an inherited
ContextSwitch extends AbstractProcessorSwitch
, as overridingswitchRowProcessor()
fromAbstractInputValueSwitch
isn't possible or is there a simpler solution?)Here's the Java code I have so far: