ppxml2db.py Format Fails in Windows

flywire commented 2 months ago

E:\>cd \sqlite

E:\sqlite>if exist kommer.db del kommer.db

E:\sqlite>
E:\sqlite>sqlite3
SQLite version 3.36.0 2021-06-18 18:36:39
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite>
sqlite> .open kommer.db
sqlite> .read account.sql
sqlite> .read account_attr.sql
sqlite> .read attribute_type.sql
sqlite> .read bookmark.sql
sqlite> .read config_entry.sql
sqlite> .read config_set.sql
sqlite> .read dashboard.sql
sqlite> .read latest_price.sql
sqlite> .read price.sql
sqlite> .read property.sql
sqlite> .read security.sql
sqlite> .read security_attr.sql
sqlite> .read security_event.sql
sqlite> .read security_prop.sql
sqlite> .read taxonomy.sql
sqlite> .read taxonomy_assignment.sql
sqlite> .read taxonomy_category.sql
sqlite> .read taxonomy_data.sql
sqlite> .read watchlist.sql
sqlite> .read watchlist_security.sql
sqlite> .read xact.sql
sqlite> .read xact_cross_entry.sql
sqlite> .read xact_unit.sql
sqlite> .quit

E:\sqlite>
E:\sqlite>python ppxml2db.py kommer.xml kommer.db
2024-04-27 20:30:01 INFO  Handling <security>
2024-04-27 20:30:01 INFO  Handling <watchlist>
Traceback (most recent call last):
  File "E:\sqlite\ppxml2db.py", line 378, in <module>
    conv = PortfolioPerformanceXML2DB(root)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\sqlite\ppxml2db.py", line 241, in __init__
    self.handle_watchlist(w)
  File "E:\sqlite\ppxml2db.py", line 168, in handle_watchlist
    fields = {"list": id, "security": self.uuid(sec)}
                                      ^^^^^^^^^^^^^^
  File "E:\sqlite\ppxml2db.py", line 55, in uuid
    el = self.resolve(el)
         ^^^^^^^^^^^^^^^^
  File "E:\sqlite\ppxml2db.py", line 45, in resolve
    ref = el.get("reference")
          ^^^^^^
AttributeError: 'NoneType' object has no attribute 'get'

E:\sqlite>

flywire commented 3 weeks ago

Discussion: https://forum.portfolio-performance.info/t/ppxml2db-import-export-portfolioperformance-xml-file-to-from-database/28839

flywire commented 2 weeks ago

Seems to be a slash issue again.

pfalcon commented 2 weeks ago

Seems to be a slash issue again.

The likely cause is a user error. Feel free to prove otherwise. This write-up will help you: https://www.chiark.greenend.org.uk/~sgtatham/bugs.html . As soon as you start asking questions on your side what may be wrong, you will either find a solution, or produce an actionable bug report.

flywire commented 2 weeks ago

I'm very competent using python for procedural programming and surprisingly I've used xml.etree before but I find it and OOP confusing. Logging shows security and watchlist objects are parsed but execution fails on the second line in the class below:

    def resolve(self, el):
        ref = el.get("reference")
        if ref is not None:
            norm = os.path.normpath(self.etree.getelementpath(el) + "/" + ref)
            el = self.refcache.get(norm)
            if el is None:
                el = self.etree.find(norm)
                self.refcache[norm] = el
        return el

Based on the next operation being to process slashes and that being a fundamental diference between linux and Windows I summised it might be the core issue again. I'd welcome any guidance to generate a bit of output here, suggesting relevant debug print lines as you've used in ppdb2xml.py would be great.

pfalcon commented 2 weeks ago

but execution fails on the second line in the class below:

Fails in which way? UPDATE: Ah, gotcha, probably the error you posted above a few days ago. Ok.

        norm = os.path.normpath(self.etree.getelementpath(el) + "/" + ref)

OMG, so os.path here too. Try following in the interactive prompt and show me output (below for linux):

>>> import os.path
>>> os.path.normpath("foo/bar/../baz")
'foo/baz'
>>> os.path.normpath("/foo/bar/../baz")
'/foo/baz'

flywire commented 2 weeks ago

E:\sqlite>python
Python 3.12.4 (tags/v3.12.4:8e8a4ba, Jun  6 2024, 19:30:16) [MSC v.1940 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os.path
>>> os.path.normpath("foo/bar/../baz")
'foo\\baz'
>>> os.path.normpath("/foo/bar/../baz")
'\\foo\\baz'
>>>

pfalcon commented 2 weeks ago

'foo\baz'

Then the same workaround as was done before in db2ppxml: https://github.com/pfalcon/ppxml2db/commit/c75e45a29530d080724e7924451bfbb3125fa9fb . Pushed.

flywire commented 2 weeks ago

Following on from https://github.com/pfalcon/ppxml2db/issues/5#issuecomment-2198200645

test.bat

@echo on
if exist kommer.db del kommer.db
sqlite3 <make.txt
python ppxml2db.py kommer.xml kommer.db
python db2ppxml.py kommer.db kommer2.xml

make.txt

.open kommer.db
.read account.sql
.read account_attr.sql
.read attribute_type.sql
.read bookmark.sql
.read config_entry.sql
.read config_set.sql
.read dashboard.sql
.read latest_price.sql
.read price.sql
.read property.sql
.read security.sql
.read security_attr.sql
.read security_event.sql
.read security_prop.sql
.read taxonomy.sql
.read taxonomy_assignment.sql
.read taxonomy_category.sql
.read taxonomy_data.sql
.read watchlist.sql
.read watchlist_security.sql
.read xact.sql
.read xact_cross_entry.sql
.read xact_unit.sql
.quit

E:\sqlite>test

E:\sqlite>if exist kommer.db del kommer.db

E:\sqlite>sqlite3  0<make.txt

E:\sqlite>python ppxml2db.py kommer.xml kommer.db
2024-06-30 08:46:03 INFO  Handling <security>
2024-06-30 08:46:03 INFO  Handling <watchlist>
2024-06-30 08:46:03 INFO  Handling <account>
2024-06-30 08:46:03 INFO  Handling <portfolio>
2024-06-30 08:46:03 INFO  Handling <account-transaction>
2024-06-30 08:46:03 INFO  Handling <portfolio-transaction>
2024-06-30 08:46:03 INFO  Handling <crossEntry>
E:\sqlite\ppxml2db.py:282: FutureWarning: This search incorrectly ignores the root element, and will be fixed in a future version.  If you rely on the current behaviour, change it to './/crossEntry'
  for x_el in self.etree.findall("//crossEntry"):
2024-06-30 08:46:03 INFO  Handling <taxonomy>
2024-06-30 08:46:03 INFO  Handling <dashboard>
2024-06-30 08:46:03 INFO  Handling <properties>
2024-06-30 08:46:03 INFO  Handling <settings>

E:\sqlite>python db2ppxml.py kommer.db kommer2.xml

E:\sqlite>dir kommer?.xml
...
30/06/2024  08:45 AM           505,248 kommer.xml
30/06/2024  08:46 AM           517,709 kommer2.xml

Files are identical except for linux/Windows line endings. This has no practical difference and np++ compareplus reports the files as identical. Regardless, it would probably be better to force linux line endings as generated by PP.

kommer 2.zip

Proof that kommer.xml still contains non-ASCII chars:
$ grep --color='auto' -P -n "[\x80-\xFF]" kommer.xml
5163:                    <name>St. Barthélemy</name>
$
(Yes, only one, but should be enough.)

This is too subtle, it is in the regenerated xml file but I can't find it in PP.

pfalcon commented 1 week ago

Closing this too, XPath path separator issues were fixed.

pfalcon / ppxml2db

ppxml2db.py Format Fails in Windows #4