Closed leileilin closed 2 years ago
Do you mean, can you train without some of the attributes? You can totally replace every element in the "speaker" array by any string and the model will be able to learn. As for other keys, here are the ones that must be there for training (all the others are either legacy or needed only during preprocessing):
document_id: str, # document name
cased_words: List[str] # words
sent_id: List[int] # word id to sent id mapping
part_id: int. # document part id
speaker: List[str] # word id to speaker mapping
span_clusters: List[List[List[int]]] # list of clusters,
# each cluster is a list of spans
# each span is a list of two ints (start and end word ids)
head_clusters: List[List[int]] # list of clusters,
# each cluster is a list of span heads
head2span: List[List[int]] # list of training examples
# each example is a list of three ints
# head, span start, span end
# this is used to train the model to predict spans from span heads
See this issue.
Do you mean, can you train without some of the attributes? You can totally replace every element in the "speaker" array by any string and the model will be able to learn. As for other keys, here are the ones that must be there for training (all the others are either legacy or needed only during preprocessing):
document_id: str, # document name cased_words: List[str] # words sent_id: List[int] # word id to sent id mapping part_id: int. # document part id speaker: List[str] # word id to speaker mapping span_clusters: List[List[List[int]]] # list of clusters, # each cluster is a list of spans # each span is a list of two ints (start and end word ids) head_clusters: List[List[int]] # list of clusters, # each cluster is a list of span heads head2span: List[List[int]] # list of training examples # each example is a list of three ints # head, span start, span end # this is used to train the model to predict spans from span heads
See this issue.
thanks, So you mean that the attribute speaker cannot be discarded, right?
it cannot be discarded, but it can be replaced with a placeholder value
it cannot be discarded, but it can be replaced with a placeholder value
thanks, i got it.
it cannot be discarded, but it can be replaced with a placeholder value
I have another new problem, I don't understand split_jsonlines function in convert_to_jsonlines.py use for? we can use mv command to transfer the .jsonlines file from temp dir to data dir.
Hello, I'd like to ask about the .jsonlines file executived through convert to jsonlines. py, Can some attributes in the jsonlines file be successfully trained after being discarded? Such as speaker, pos.