solidify / jira-azuredevops-migrator

Tool to migrate work items from Atlassian Jira to Microsoft Azure DevOps/VSTS/TFS.
MIT License
261 stars 226 forks source link

Detected unicode characters, removed. #227

Closed gekap closed 4 years ago

gekap commented 4 years ago

Hello,

When I start the import I get the below warnings.

[W][16:28:50] Detected unicode characters, removed. [W][16:28:50] Failed to load 'DISSIM-1649.json' (perhaps not a migration file?). [W][16:28:50] Detected unicode characters, removed. [W][16:28:50] Failed to load 'DISSIM-1651.json' (perhaps not a migration file?).

As result the 2 json files does not import into azure devops.

How can I import them?

madkoo commented 4 years ago

@gekap can you share one file for testing purposes?

gekap commented 4 years ago

DISSIM-1649.zip

gekap commented 4 years ago

Hello,

Is any update ?

madkoo commented 4 years ago

@gekap looked a bit into this and one special character is double escaped and when tool handles those the JSON gets invalid. So for a workaround open the example file you send and search for "\u2026" remove one "\" in the beginning. Save and rerun the import and this specific item will be imported fine.

MOlausson commented 4 years ago

@gekap did you manage to get around this?

madkoo commented 4 years ago

@gekap closing issue due to inactivity. If you still have problems feel free to reopen the issue.

gekap commented 4 years ago

Hello

I get again the same error The file is attached, I tried to fix it but I get the same error

ODS-1484.zip

Please check it URGENTLY

omustapasa commented 1 year ago

@Alexander-Hjelm @madkoo This is still an issue with the master branch.

I checked the source code and see that there's a regex to clean up such unicode before the JSON is parsed and converted into a WorkItem object however checking my own import log, I see there are exported files which have failed due to having some unicode being escaped which does not match the source code.

serialized = Regex.Replace(serialized, @"\\u[0-F]{4}", ""); only seeks for values like "\u2026" however a few of my recent items (bugs including stacktrace) has values like "\u00602" hence 5 characters instead of 4.

Exception of type \\u0027System.OutOfMemoryException\\u0027 was thrown. at System.Collections.Concurrent.ConcurrentBag\\u00601.ToArray() at System.Collections.Concurrent.ConcurrentBag\\u00601.GetEnumerator()

You might like to update the RegexReplace string value to something with "\u[0-F]{4,5}" or "\u[0-F]{4,}"

FYI