Links in this format: "*[Neubau](https://www.some-link.com)*" have an issue.
Code:
text = "*[Neubau](https://www.some-link.com)*"
sentences = somajo.tokenize_text([text])
for sentence in sentences:
for token in sentence:
print(f"{token.text}\t{token.token_class}\t{token.extra_info}")
Returns:
* symbol SpaceAfter=No
[ symbol SpaceAfter=No
Neubau regular SpaceAfter=No
] symbol SpaceAfter=No
( symbol SpaceAfter=No
https://www.some-link.com)* URL
Should return something like this:
* symbol SpaceAfter=No
[ symbol SpaceAfter=No
Neubau regular SpaceAfter=No
] symbol SpaceAfter=No
( symbol SpaceAfter=No
https://www.some-link.com URL
) symbol SpaceAfter=No
* symbol SpaceAfter=No
I’ve decided to explicitly add markdown links, so this should be fixed now, with the caveat that it will fail if the link description contains square brackets or if the URL contains parentheses.
Links in this format:
"*[Neubau](https://www.some-link.com)*"
have an issue.Code:
Returns:
Should return something like this:
Full code: https://colab.research.google.com/drive/16-CKdzp20Gin02emrLVeHfFFir2veK8M?usp=sharing