greeny / SatisfactoryTools

Satisfactory Tools for planning and building the perfect base.
https://www.satisfactorytools.com/
MIT License
255 stars 56 forks source link

[Feature Request] Add capability to export item flow digraph adjacency matrix to CSV #90

Open wizard1073 opened 2 years ago

wizard1073 commented 2 years ago

With very large digraphs, it can be too complex to read and analyze the structure visually through the web interface. Exporting the digraph of the item flow to CSV allows it to be imported into tools that automate the analysis. For example, the digraph of item flow can be treated as a Design Structure Matrix, a system engineering tool that uses topological sorting and grouping of related graph nodes. One tool in particular from MIT is a macro-enabled Excel spreadsheet that automates the analysis and partitioning of the DSM, and is an ideal candidate for importing the CSV version of the digraph from this tool.

[Thank you for developing such a great tool!]

greeny commented 2 years ago

hi, how would you imagine such an export? :thinking: I have pretty much no experience in saving graph structure in CSV, so if you have any suggestions, I'm open to those.

Anyway, import/export for various formats is planned feature, so this can be very much part of that.

wizard1073 commented 2 years ago

I expect you have the adjacency data internally in an array, unless you are using a more compact adjacency list (linked list form). A CSV export of the adjacency matrix would have one row for each item in the graph. Making fuel from crude oil would look like this:

Oil,0,0,0 Fuel,1,0,0 Polymer Resin,1,0,0

The column names are unneeded because they are identical to the row names. There is a "1" any place Oil "causes" Fuel and Polymer Resin. I have suppressed the machinery, because they are the "physical architecture", while the item flow by itself is the "logical" or "behavioral" architecture which is the focus of analysis. A similar approach could be taken for the physical architecture (machines and splitters/mergers), but there would need to be multiple "layers" of the data to fully represent that complexity.

Since we have at most ~90 items, a full adjacency matrix would have ~8100 entries, most of which would be zeros. A sparse representation is smaller, because it eliminates all of the zeros, but must be converted to get into Excel. At most, I estimate a full CSV file with 90 items and ~5% non-zero entries should be less than 200kB.

greeny commented 2 years ago

I currently have array of nodes with IDs and array of edges with [from - to] definition. Keep in mind that edge can go from A to B and from B to A at the same time (e.g. recycled plastic/rubber loop). I'm not sure if that can be put into a matrix.

Also, you shouldn't count items, but recipes. Each node is one recipe. We have ~150 recipes, so technically you can go over ~22k entries, which is quite a lot imo.

wizard1073 commented 2 years ago

If I understand correctly, you have the adjacency matrix of recipes, and you have a definition for [from - to], which can exported as is or transposed (both forms are valid). You can have entries in both the upper and lower triangle--that's an indication of a feedback loop, which can occur in production lines.

I would like to understand how to get the item flow data from your internal data representation. What gets drawn onscreen has machines as vertexes and flow rates of each item as edges. The vertexes have additional information: number of machines, machine type, and efficiency. When we suppress calculating mergers and splitters, the graph resolves to just calculated item flow. We only need to export the edge data from this simplified form of the graph.

By saying "each node is one recipe", does that mean you are storing the input rates and output rates of the recipe? Do you store the numbers after the desired production rates are calculated? It sounds like there would need to be a mapping process that converts from recipe-based vertexes to item vertexes. Each recipe has a different name, but some output names are just variants and map to the base name. Once the name is reduced to the base name, the adjacency matrix rows can be created for each recipe output using the same inputs for each row.

If I understand this correctly, if all recipes were utilized in one graph, then the adjacency matrix for the item flow would have ~22,500 entries, most of which are zero, so now the filesize is at most ~1MB. The MIT tool can handle up to 250 items (62,500 entries), so this is still feasible. This is also why we look at automation tools to handle such complex production setups!

greeny commented 2 years ago

it's a bit complex on how the tool exactly produces the final visualisation. If you have discord, you can add me there and we can talk about it (greeny#4945). However I'll try to put basics here as well:

wizard1073 commented 2 years ago

I will see if you are on Friday evening (US east coast time) and chat more then. Thank you!

wizard1073 commented 2 years ago

Went extensively through the code. Very clean code, practically self documenting!

It looks like the information stored in RecipeNode (src/Tools/Production/Result/Nodes/RecipeNode.ts) needs to be saved to a central data store so that a separate process (export adjacency matrix CSV) can parse the array, create the adjacency matrix from the ingredients field in each RecipeNode, and export it. If I understand correctly, this is the subset of recipes the solver selected from all possible/allowed recipes sent to the solver, multiplied by scaling factors (per recipe) to make the production chain meet the desired rates of the user-selected products. It also looks like this data currently only goes to the graph processor.

I'm up on discord now if you want to chat outside these comments.