joyfullservice / msaccess-vcs-addin

Synchronize your Access Forms, Macros, Modules, Queries, Reports, and more with a version control system.
Other
211 stars 41 forks source link

Export All Table Data As XML or Tab Delimited #250

Open OnBeyondBeing opened 3 years ago

OnBeyondBeing commented 3 years ago

Clicking on the menu item 'VCS Export All Source' does not export any table data. Neither does clicking on 'Export All Source' in the main VCS dialog after checking 'Full Export'.

It seems like the only way to export all table data is to manually click on each table in 'VCS Options' -> 'Table Data' and change the 'Data to Export' from blank to 'Tab Delimited' or 'XML Format'. For a database with dozens (or hundreds) of tables, this can be very tedious.

Would it be possible to add a procedure to export the data from all tables in the database? Alternatively, would it be possible to add a button in 'VCS Options' -> 'Table Data' to set all displayed tables to save as XML or Tab Delimited?

Alternatively, if there's a simple way to do this in VBA, I could try that.

Thanks again for this amazing tool!

joyfullservice commented 3 years ago

Are you wanting to include the table data in version control like Git or Subversion, or are you using it for something else? Normally version control is used more for the application design and configuration, and a backup system is used for the data itself. Can you explain a little more of what you are wanting to do? 😄

hecon5 commented 3 years ago

@OnBeyondBeing: This may have unintended consequences, especially for linked tables or system tables (especially visible ones).

There are "hidden system tables" that get exported via alternate means (msysresources for one); exporting them via XML may cause some conflicts.

The "integrated" VCS for this does allow for local table export, but IMO, "production" data should be saved in a different manner than the VCS, for the reasons @joyfullservice mentioned. It also could carry proprietary / special handling requirements that you very much do not want to be in your code base (security/regulatory/criminal consequence type stuff).

If you want your local table data (data in this case being your "production" data) I'd suggest writing a function to do this that's part of your local database; and specify that function to perform prior to export . This would allow you to back things up whenever, and anyone who's managing your database could do this, even non-developers.

For local tables which allow you to control the front end and are part of your development project (for me, I have a local table that defines the production/test locations to allow switching back-ends), and a local table to hold some dynamic defaults and preference settings and a specialized shared image table, so there are really only a few tables that aren't part of my data set.

This allows me to have a split database, and makes the whole thing a lot more resilient in the case of me / someone else accidentally blowing up the world by mucking with production data while developing.

I don't think this is a feature we should implement, as I think making it mildly more difficult to mix your production data with your code base is a good thing in this case.

OnBeyondBeing commented 3 years ago

@joyfullservice

I'm working with a 'legacy codebase' that uses a common, but absolutely useless versioning system. Here's an artist's rendering.

I'm trying to find out what differences exist between the files. The Database Compare tool that comes with Office is not remotely particularly useful. Neither is doing a binary diff on the files themselves. I even tried a pay program called AccdbMerge, but that kept crashing.

Each database I'm working with is pretty small -- under 100 tables each with under 100 fields and under 10,000 rows. So if I can get all the data into text, then I can use standard diffing tools to figure out what's going on.

I know that version control is 'normally' used for source code, but the reasons for keeping source code in a VCS also apply to small or rarely-edited data sets. It's easier to track changes, create branches to work on, and return to known, functional states. The fact that 'professional' database files are normally too big to keep in git seems like an accident of history. If RAM and storage had developed faster, we might not have minded putting multi-GB files in git. I'm sure there are image and video artifacts on GitHub that would have been totally unacceptable in the age of cvs or svn.

I would guess that the vast majority of Access applications are small enough to easily keep in git. A lot of people use Access for small projects; others do the front end in Access and keep most of the data on a back-end server. In either case, I think that those Access files and their data tables would benefit from version control.

Besides that, I think most people expect version control to be able to completely recreate a functionally-identical copy of the build object -- in this case the mdb or accdb file. If you're worried about large tables, maybe you could put in a warning for tables over a certain size.

But even if you don't think that extracting all table data should be the default, and even if you don't think that extracting all table data should have a dedicated button, I think it really makes sense to be able to set all data tables to be extracted in a few clicks. Maybe you could figure out a way to apply one status change to multiple selected tables.

OnBeyondBeing commented 3 years ago

@hecon5, you make some excellent points.

I don't think msys* tables should be extracted by default. Most users don't even know they exist, and it makes more sense for msaccess-vcs-integration to manage most of them in code when doing a build. I'm honestly having trouble thinking of a use case for manually versioning most of them.

I agree that tables could contain proprietary data or other secrets, but so can queries, forms, reports, and VBA code. Imagine a query with a sensitive string in the WHERE clause or a report with a label like 'People We're Going to Fire'. Managing secrets is hard, especially in version control. But people have to worry about that in the Access file itself anyway. How is having to worry about a folder of text files any different?

At your suggestion, I'm writing a VBA function that will use Application.ExportXML to export all table data to the msaccess-vcs-integration default folder. I'll post it if and when I get it working.

But regardless of whether I get something else working, I don't think that there is as clear a line between code and data as we might want there to be. Access projects especially tend to blend them together. So I think it makes more sense for a VCS to default to saving everything, and then allow the user to exclude things that are too big or too ephemeral to track.

hecon5 commented 3 years ago

Hmm, these are fair points.

In terms of sensitive data, you're correct, a query with a field, peopletobefired is a field one would have raised eyebrows at. However, it's not actionable. It doesn't tell you "who" is getting fired, only that that column has that data. In a properly secured (split!) Dataset, only the authorized users should have access to that query anyway.

I agree, though, Access definitely blurs the line, as does Excel, and others. The fact it's blurry doesn't make it a good practice to put your code with your data set, though.

However, yours is a very interesting issue. You can't just go adding before export subroutines to an already existing file to diff a legacy file, and if you're trying to get a handle on version control, to split your data from the front end later, that further exacerbates it.

I think the best way to handle this is to add a export all local tables button on the table config tab. Because we don't want to encourage users to comingle code and data (Able to doesn't make it right), I don't think it should be part of the typical export.

But I do think it would be very handy, especially in this case.

That said, if you intend to use git (or other version control), be advised: XML is quite unfriendly with git. I'm dealing with it in another context, and because XML is order insensitive, it's led to a ton of false positive changes and collisions.

Therefore, I suggest if you're exporting to track changes, use CSV or TSV for this and other compares.

joyfullservice commented 3 years ago

@OnBeyondBeing - Thanks for sharing the additional background on your environment. I totally understand how developers sometimes inherit projects that were not necessarily developed using best practice methods and the "ideal" solution doesn't always work very well in real life. 😄

Like @hecon5 pointed out, we don't necessarily want to encourage people to routinely export all the tables in the database, but in a case like yours, it certainly sounds like a helpful option to be able to compare various "versions" of a project.

As I was pondering how to implement something like that, I am kind of leaning towards allowing a user to enter * in the following input box to have it export all non-system local tables in the current database.

image

This would be helpful in a case like yours because you could set your default options to include the * table export, then run the export in each of your databases, which would then save the options file for each database to include the wildcard. Any new tables added would automatically be picked up for export.

Functionally, after using this option, the list of tables would include a single wildcard record in place of the local tables. System or linked tables could still be selected individually as desired. Setting the export type to None for the wildcard item would reset the list back to the normal functionality.

This would provide the functionality to the specific use cases where this would be helpful, but not make it so obvious that new users will default to using it all the time, since it technically isn't the direction we would want to steer people with a version control integration.

hecon5 commented 3 years ago

This is actually exactly how the integrated version of this VCS works (well, the flavor with a config file I used), so I think is a good direction. Only change I'd add is to add a note that most tables should not normally be exported.

bclothier commented 1 year ago

I have few questions here:

1) is there any technical reasons why we would want CSV or tab delimited as output? I generally regard them as the worst option because there is no standard for defining the output and far too many variants to make it useless for anything that's not just a bunch of strings or numbers. Dates, Booleans, currency all bring their own pain. I'm inclined to just provide XML option and be done with it.

2) Is there a reason why it can't be a subform with a table? I'm cool with saving the data into a JSON but for the presentation, I think it'll make for much nicer UX to have a subform where you can set several values or keyboard around the list quickly and select. In our projects, we might have anything from 5-20 tables that are used to drive program logic and therefore their data are "code" and needs to be in version control. Having to set the table one by one, reload the listbox and scroll through the long list of tables really get tedious fast.

3) This is incidental but I am also finding that if we have a project that's starting out with the version control, it would be nice to get prompted on the first export to ask if we want to identify some tables' data to be exported and then remember this. That way, we avoid the exports forgetting to export the table data which would then build a useless database because the table data used for program logic are now missing. Does that sounds like something worth implementing?

joyfullservice commented 1 year ago
1. is there any technical reasons why we would want CSV or tab delimited as output?

That was the output provided in the earliest versions of the add-in. I added the XML output years ago because it was so much more reliable for structuring the data, but it was initially pretty atrocious from a source readability standpoint. Adding line breaks and indenting went a long way in making the data more readable, but for the layman, a simple CSV offered a more friendly view of the data itself. (Especially in the case of a simple config table with a few columns of data.)

I used to use CSV/TDF on occasion, but now I only use XML. I can see how some would appreciate the JSON option, and this wouldn't be too difficult to add since we are already using the parsing libraries elsewhere. (Although I have had to make a couple tweaks for UTF-8 compatibility, and my PR is still sitting open on the parent project.)

I am open to the idea of phasing out or dropping support for CSV/TDF, and certainly on the import side. Originally the primary purpose was to review the table data through VCS, but the focus of the add-in has shifted more in recent years to building functional databases from source code. For this reason I think XML (and possibly JSON as an alternative) is going to be a better path forward for the long term.

I just want to give some consideration to those that may be using the CSV/TDF feature currently. That being said, version 4 is probably a great place to make that break if that is the general consensus from some of the more seasoned voices in the field.

2. Is there a reason why it can't be a subform with a table? I'm cool with saving the data into a JSON but for the presentation, I think it'll make for much nicer UX to have a subform where you can set several values or keyboard around the list quickly and select. 

That sounds like a good idea to me. I took this approach with the conflict resolution screen, and was pretty happy with the way that turned out. In the current form it is rigidly compact in a very small area, and would make future things like adding translations more difficult. It would probably be easier to use a larger dedicated form for this selection. (You could copy the conflict resolution form as a template.)

3. This is incidental but I am also finding that if we have a project that's starting out with the version control, it would be nice to get prompted on the first export to ask if we want to identify some tables' data to be exported and then remember this.

That's a good thought, although the one consideration might be the intended target user base. There are a handful of experienced developers with more advanced database systems that involve local config tables. Most of these would already be pretty familiar with version control concepts.

But another, and perhaps even larger part of the user base would be those that are fairly new to the version control concepts. Maybe they have been maintaining and building a database for years, and recently started learning about version control. One of my desires with this add-in project is to help guide those users into the benefits of VCS without being intimidated by complex tools with steep learning curves. That's why I have put so much time into the UX elements like the ribbon toolbar, progress bars and simple forms with clear actions.

As new users are grappling with these concepts, they see a list of their tables and think, "Yeah, I want to save all those tables!" Select-all, click, done. A few months later, (hypothetically) they are sweating bullets under the angry scowl of their boss... "You did what!!! You uploaded all the customer PII to GitHub! Just wait till Legal hears about this!"

In my personal opinion, many of the newer, less experienced users may have a little harder time understanding the difference between control tables and data tables. If we make it too easy and convenient to export table data, it enables them to make decisions that might end up hurting them later on. I don't want to limit them with the tool, but I want to guide or encourage them toward wise design and development strategies that will bring them the greatest long-term benefit.

It is a little tedious to select multiple tables to save as XML, but it is a one-time setup that is saved with the project in the vcs-options.json file. (You can also edit this file directly, if that is easier.)

Well, I guess that's kind of a long-winded answer, but hopefully conveys some of the reasoning behind why it's not easier to save table data. 😄

bclothier commented 1 year ago

Thank you for the explanation. I can appreciate that giving too much options unfortunately inadvertently create problems for them. In the scenario I envisioned, I wanted to make sure I wouldn't breeze through the first export without thinking about all the choices I need to make and have the same expectations for my colleagues. However, my expectations of my colleagues probably shouldn't be imposed on the rest of users. That led me to think about whether having a "team policy" that can be enforced as the setup of the addin as a possible way to set the expectations. As interesting as it may be, I'm not sure ROI is there for that idea so I think I'll just make a point to train to always check the options and the initial export as well testing the first import.

With that out of the way, I think once one has found the options, they already passed the threshold of "just let me play" into "trust me, I know what I'm doing[citation needed]." so having a subform would make it easier to go through the list quickly. I don't think we'll want a button to select all table; they will have to do the selection themselves to indicate that those selected tables should be exported.

I will leave the question of CSV/TDF format until we have more consensus from others.

joyfullservice commented 1 year ago

Sounds great! 👍

On the topic of "team policy", the options are handled at three different levels. The project level, the user level, and the hard-coded defaults in the class. You may have observed this in the code, but you can take a set of options, and make them the "default" options for new projects. This writes a set of options to Version Control.json file in the add-in installation folder which is then used as the (user-level) base settings when creating a new project. (When the VCS add-in is used for the first time on a database project.)

image

Of course different projects may have different requirements, so that is why you also have a copy of the options in each project's source folder. The idea is that you start the project with your generally preferred options configured at the user level, then tweak as needed at the individual project level.

One way that you could potentially streamline this with a team of developers is to use Group Policy to update %appdata%\MSAccessVCS\Version Control.json to a company-specific standard template. That would probably work right out of the box, and could be helpful in getting each new project started with the same options.

hecon5 commented 1 year ago

Potentially Relates to #389