Open johnking opened 7 years ago
Hi,
you raise a good point. In fact, I am not really satisfied by how multi-level formats are supported right now. On one hand, e.g. the "compute engine" is generic enough to support an arbitrary number of levels, but global parameters/command line options support 3 and 3 only levels. Moreover, this support is done through replication of keys/names/variables.
For all these reasons, I will probably rework the relevant code for multi-level formats in aeneas v2, and while doing that I will address your issue directly.
Unfortunately, this plan also means that I am not going to address your issue in the 1.x series, so you need to either process the id after the JSON file has been produced, or patch your local version of aeneas. In the latter case, you might want to modify
def format(self, syncmap)
in
https://github.com/readbeyond/aeneas/blob/master/aeneas/syncmap/smfjson.py#L53
or, even better, the code of
@property def json_string(self)
in
https://github.com/readbeyond/aeneas/blob/master/aeneas/syncmap/__init__.py#L248
(you need to keep track of the level in the recursive visit, and add the suitable "type": "value" to the dictionary which is appended in line 262)
HTH,
Alberto Pettarin
On 07/01/2017 05:44 PM, johnking wrote:
Hi @readbeyond https://github.com/readbeyond
To get the type (Paragraph, Sentence or word) from the syncmap JSON data based on multi-plain text, we have to parse the |id| field such as "p000014s000001w000002".
It would be nice to have one more field -|type| into the JSON data to avoid such post-processing.
If it does not make sense to this repository, may you please give me some hints to modify the code by myself?
thanks a lot
-John
@readbeyond , Hi Alberto,
Thanks for your reply and sharing us the roadmap, looking forward to V2.0!
thanks again!
-John
@johnking hi, you might want something like this:
@property
def json_string(self):
"""
Return a JSON representation of the sync map.
:rtype: string
.. versionadded:: 1.3.1
"""
def visit_children(node, level):
""" Recursively visit the fragments_tree """
output_fragments = []
for child in node.children_not_empty:
fragment = child.value
text = fragment.text_fragment
output_fragments.append({
"id": text.identifier,
"language": text.language,
"lines": text.lines,
"begin": gf.time_to_ssmmm(fragment.begin),
"end": gf.time_to_ssmmm(fragment.end),
"children": visit_children(child, level + 1),
"type": level
})
return output_fragments
output_fragments = visit_children(self.fragments_tree, 0)
return gf.safe_unicode(
json.dumps({"fragments": output_fragments}, indent=1, sort_keys=True)
)
@readbeyond , Hi Alberto, Thanks for your sharing, really appreciate it.
I am developing an App and want to reuse/expand the JSON structure, I will share my idea once I finish the prototype.
thanks again.
-John
Hi @readbeyond
To get the type (Paragraph, Sentence or word) from the syncmap JSON data based on multi-plain text, we have to parse the
id
field such as "p000014s000001w000002".It would be nice to have one more field -
type
into the JSON data to avoid such post-processing.If it does not make sense to this repository, may you please give me some hints to modify the code by myself?
thanks a lot
-John