Closed jhgarrison closed 6 months ago
This seems to be a bug introduced by #314. While Access ExportXML may incorrectly encode the ampersand character in an a TableDef index name, it DOES correctly encode table data.
The end result is that the output of ExportXML for table data is correct, but then calling SanitizeXML on the output file results in double encoding.
It appears the application of SanitizeXML must be restricted to only those files (or even parts of files) that require it.
For a local fix I'm just removing the call to SanitizeXML
from clsDbTableData::IDbComponent_Export()
. Not sure if this is the correct fix globally (it may need to be removed other places) so I won't submit a PR.
Another thought... It appears MS have made a change to ExportXML sometime in the last few months. That same file, exported with my local fix, now produces this diff:
-<Comment>Observation Req'd</Comment>
+<Comment>Observation Req'd</Comment>
which is what I would expect, but does mean ExportXML is now encoding characters that previously were not encoded.
It might be necessary to evaluate if SanitizeXML is still needed.
Thanks for the research and testing on this! We can certainly limit the sanitizing by version if different versions of Access behave differently on the character encoding. I will try to do a little testing on Access 2010 when I have a chance...
Apart from comment text, this can break things if you have escaped characters in validation rules. For example, I found that when I had a ValidationRule on a field of >=1
this was getting exported as <od:fieldProperty name="ValidationRule" type="12" value="&gt;=1"/>
and when trying to build from source again it came in as ">"=1
in the Validation Rule box.
Microsoft® Access® for Microsoft 365 MSO (Version 2209 Build 16.0.15629.20152) 32-bit VCS add in version 3.4.23
My current workaround is scripting an edit of the exported tbldef XML files changing &
only when it is the start of entities like &gt;
to >
and &lt;
to <
but not when it is just escaping an instance of &
that isn't the start of another entity reference.
I just came across this issue recently when I tried to migrate my few local tables from Tab Delimited to XML.
Particularly, it botched the USysRibbons
table, which has an RibbonXml
field used to customized the ribbon.
Before (in this custom ribbon, I only hide the Help toolbar):
After exporting and rebuilding from source:
For the time being, I will keep the exported data as Tab Delimited for this table.
But I do think this issue should be fixed. @joyfullservice did you look into it? Do you need help?
Does it have this problem in the latest build on the dev
branch? I am thinking we had worked through some XML issues a little while back... We also have some recent updates to rework the entire XML structuring, thanks to @bclothier's excellent contributions to refactor the XML processing to use XSL.
Let me know if you still see this issue after building the dev
branch from source...
Ah, I didn't think of checking the dev
branch! I was waiting a stable release 😸
Anyway, checking with latest release 4.0.9, I was able to export and reimport successfully, and it works as expected! So this is great. Thank you and I'm sorry to have bothered you without properly searching.
Maybe you can add this information as a "known issue" of the 3.x branch.
As a side note, the export time is roughly longer by 30% between 3.x and 4.x branch, mainly due in the tables category (x5) and the Read File operation (x10). I'm not really sure why, I haven't looked precisely into it.
One concern is whether it's due to the changes introduced in #388 . A quick test would be to compare the version from this commit which is the last one before #388 was merged.
I do expect some slowdown due to the new table connection checks introduced in the PR but would like to confirm whether it's actually 30% slower which seems surprising.
Please note that I used the latest released build, which is the version 4.0.9. I'll try again with the 4.0.10 once available and share the results.
@Indigo744 - Attached is a fresh build from the current dev
branch, which would have all the latest updates, including the enhancements from @bclothier. Let me know what you see on the performance side... (You will notice that the performance reports now sort the items by time, so you can see the slowest items at the top of the list.)
Version Control_v4.0.10(dev).zip
I did a fair bit of testing on this with one of my most complex databases, comparing between the 4.04 and the current dev
build. After testing back and forth between the versions, running multiple full exports, the current version is coming up consistently faster.
The chart below compares the two version with the seconds involved in running some of the bigger operations:
Here are the actual performance reports from version 4.0.10 on this database during a full export:
--------------------------------------------------
PERFORMANCE REPORTS
--------------------------------------------------
Category Count Seconds
--------------------------------------------------
Forms 209 4.79
Tables 138 2.42
Reports 46 1.98
Queries 75 1.20
Modules 42 0.51
DB Connections 5 0.37
Doc Properties 1 0.29
DB Properties 54 0.09
VBE Forms 1 0.07
Macros 2 0.06
Shared Images 3 0.05
Nav Pane Groups 1 0.05
Table Data 1 0.04
Themes 1 0.04
VBE References 11 0.04
Hidden Attributes 0 0.04
VB Project 1 0.03
Project 1 0.03
Table Data Macros 0 0.01
Proj Properties 0 0.00
Relations 0 0.00
IMEX Specs 0 0.00
Saved Specs 0 0.00
--------------------------------------------------
TOTALS: 592 12.11
--------------------------------------------------
--------------------------------------------------
Operations Count Seconds
--------------------------------------------------
Read File 590 2.51
Sanitize File 332 2.03
App.SaveAsText() 332 1.92
Scan DB Objects 1 0.88
Increment Progress 1815 0.83
Save Table SQL 138 0.64
Convert to JSON 3606 0.57
Read File Bytes 632 0.39
Write File 723 0.35
Clear Orphaned Files 10 0.32
Get File Property Hash 1048 0.26
Save Query SQL 75 0.23
Console Updates 8 0.19
Compute SHA256 2076 0.18
Read File DevMode 255 0.15
Get VBA Hash 297 0.12
Check for linked table 138 0.12
Parse JSON 1 0.12
Export VBE Module 42 0.11
Delete File 377 0.10
Verify Path 1103 0.10
Get Modified Date 524 0.09
Enc. Windows-1252 as utf-8 42 0.06
App.ExportXML() 2 0.01
RunBeforeExport 1 0.01
Export Table Data as TDF 1 0.01
Quick Count Objects 1 0.00
Quick Count Files 1 0.00
Sanitize XML 2 0.00
Format XML 2 0.00
Write Binary File 3 0.00
Export Theme 1 0.00
Clear Orphaned Folders 1 0.00
--------------------------------------------------
Other Operations 0.06
--------------------------------------------------
@joyfullservice @bclothier After testing the latest 4.0.10 build, the difference is only 10% slower (4s), which is only marginally slower.
3.4.23 4.0.10
-------------------------------------------------- --------------------------------------------------
PERFORMANCE REPORTS PERFORMANCE REPORTS
-------------------------------------------------- --------------------------------------------------
Object Type Count Seconds Category Count Seconds
-------------------------------------------------- --------------------------------------------------
Project 1 0.01 Queries 348 17.96
VB Project 1 0.01 Forms 265 11.40
VBE References 7 0.03 Tables 249 6.10
Proj Properties 2 0.00 Reports 134 4.57
DB Properties 61 0.14 VB Project 1 0.92
Shared Images 16 0.10 Modules 48 0.89
Themes 1 0.00 DB Connections 1 0.76
Tables 249 3.38 Doc Properties 3 0.71
Queries 348 18.97 Shared Images 16 0.22
Forms 265 11.22 Hidden Attributes 3 0.19
Macros 14 0.13 DB Properties 61 0.19
Reports 134 4.54 Macros 14 0.17
Table Data 4 0.09 Table Data 4 0.14
Modules 48 0.70 Nav Pane Groups 1 0.04
Doc Properties 3 0.02 VBE References 7 0.04
Nav Pane Groups 1 0.01 Proj Properties 2 0.03
Hidden Attributes 3 0.00 Table Data Macros 0 0.02
-------------------------------------------------- Themes 1 0.02
TOTALS: 1158 39.36 Project 1 0.01
-------------------------------------------------- Relations 0 0.00
IMEX Specs 0 0.00
VBE Forms 0 0.00
Saved Specs 0 0.00
--------------------------------------------------
TOTALS: 1159 44.38
--------------------------------------------------
-------------------------------------------------- --------------------------------------------------
Operations Count Seconds Operations Count Seconds
-------------------------------------------------- --------------------------------------------------
Read File 1564 0.30 Read File 1179 11.00
Parse JSON 386 0.29 App.SaveAsText() 761 10.95
Convert to JSON 10736 2.41 Read File Bytes 1236 4.60
Compute SHA256 1751 0.08 Scan DB Objects 1 3.34
Console Updates 7 0.08 Save Query SQL 348 2.90
Compare Dictionary 384 0.00 Sanitize File 761 2.90
Get Modified Date 1061 0.11 Convert to JSON 8449 2.61
Get File Property Hash 1061 0.20 Save Table SQL 249 1.86
Clear Orphaned 8 0.42 Increment Progress 3994 0.61
Write File 1394 0.47 Write File 1762 0.56
Verify Path 2238 0.12 Enc. Windows-1252 as utf-8 48 0.42
Write to Disk 16 0.00 Clear Orphaned Files 9 0.42
Create Folder 1 0.00 Check for linked table 249 0.40
Export Theme 1 0.00 Get File Property Hash 2168 0.38
Save Table SQL 249 0.22 Get VBA Hash 447 0.23
App.ExportXML() 18 0.17 Compute SHA256 4005 0.22
Read File Bytes 1178 4.72 App.ExportXML() 18 0.20
Sanitize XML 18 0.01 Export VBE Module 48 0.17
Format XML 18 0.01 Delete File 837 0.15
Increment Progress 69 0.48 Read File DevMode 399 0.15
App.SaveAsText() 761 11.33 Parse JSON 1 0.13
Delete File 813 0.15 Verify Path 2607 0.11
Sanitize File 761 2.88 Get Modified Date 1108 0.10
Save Query SQL 348 3.51 Console Updates 7 0.03
Read File DevMode 399 0.15 Move File 21 0.02
Get VBA Hash 447 0.25 Sanitize XML 18 0.02
Export VBE Module 48 0.16 Quick Count Objects 1 0.01
Enc. Windows-1252 as utf-8 48 0.40 Format XML 18 0.01
-------------------------------------------------- Quick Count Files 1 0.01
Other Operations 11.63 Write Binary File 16 0.00
-------------------------------------------------- Export Theme 2 0.00
Create Folder 1 0.00
Clear Orphaned Folders 1 0.00
--------------------------------------------------
Other Operations 0.12
--------------------------------------------------
@joyfullservice How much work is still needed before a 4.0.x release? Is it PROD ready yet or should I wait more stability?
One thing I can't help not noticing is that in 3.4.26, the Other Operations
takes 11.63
whereas it's 0.12
in 4.0.10. That may indicate that the higher time reported for some objects such as table might be actually more accurate measurement. The Read File
went from 0.30
to 11.00
.
I do not know enough about the performance measurement implementation to be sure whether that affects categories. For example, 3.4.26 reports 3.38 seconds whereas 4.0.10 reports 6.10 seconds but is that because it is now measuring the time more accurately than previously? That does have the unfortunate side effect of snowing out where the actual slowdown is. I'm glad it's only 10% slower given the other changes that were added.
Thanks for posting the performance reports! That is really helpful, especially on a very large, complex database. One thing that has me a bit mystified is why we see such a performance difference in the Read File function... I have verified in the source code, and the function itself is identical between these versions. 🤔 If you take out this difference, version 4.0.10 is actually a few seconds faster overall, which would make sense to me, given some of the additional optimizations in the newer version. (In my testing I was finding the newer version generally slightly faster.)
I did notice something interesting with the Read File function on my computer yesterday. I noticed that the read times seemed higher than I was expecting, and my computer was doing a lot more with memory, CPU and disk IO. Windows Explorer seemed to be using quite a bit of CPU, so I restarted the process. Subsequent exports went much faster, and the Read File function was back in the expected range. This might have been a fluke thing with my computer, but it was interesting to note.
Regarding the performance tracking, the newer version is going to be more accurate, especially in regard to the Other Operations. I got that cleaned up a bit more in the newer version to ensure we were tracking more operations that were slipping through the cracks in earlier versions.
@joyfullservice How much work is still needed before a 4.0.x release? Is it PROD ready yet or should I wait more stability?
Great question! I have been using it in production, and it is working great for me. The main things remaining before release is to finish working through the last few remaining objects to add merge support, then finish out the merge build functionality. (This will allow you to merge in a few changed source files into an existing database without needing to build the entire thing from scratch.) The merge build will be a game-changer in a multi-developer context because it allows you to quickly and easily merge in another developer's changes without having to stop and build everything from source.
The other significant change I am planning to implement before the general 4.0 rollout is the splitting out the VBA code from form and report exports. This is discussed in more detail in #378, and would mean that a form has two source files. One with the object definition, and another corresponding class file with the VBA code. I am pretty close to finishing a way to make this split while still preserving the git history for those using git as their VCS back end.
I am pretty comfortable with v4 at this point, and don't really anticipate any other major breaking changes in this version as we head towards the general release. There is a little fine tuning left on the conflict detection (particularly in relation to orphaned files), but that is all new functionality anyway.
I have re-tested with v4.0.34 rather than 3.4.23 and the export and import of validation rules with '<' characters in now works as well as table field comments with ', ", <, > and & characters. So I think this issue can be closed.
When exporting table contents, the latest version (3.4.23) seems to be double-escaping text in XML files.
Here's one sample diff from within a file that hasn't been exported since before I upgraded to 3.4.23
If you were just escaping I'd expect
Observation Req'd
, but it seems the escaping has been applied twice. I haven't tried building from source but I suspect this is not correct.