simonw / advent-of-code-2022-in-rust

Copilot-assisted Advent of Code 2022 to learn Rust
120 stars 5 forks source link

Day 7: No Space Left On Device #9

Closed simonw closed 1 year ago

simonw commented 1 year ago

https://adventofcode.com/2022/day/7

Puzzle input looks like this:

$ ls
dir a
14848514 b.txt
8504156 c.dat
dir d
$ cd a
$ ls
dir e
29116 f
2557 g
62596 h.lst
$ cd e
$ ls
584 i
$ cd ..
$ cd ..
$ cd d
$ ls
4060174 j
8033020 d.log
5626152 d.ext
7214296 k

determine the total size of each directory. The total size of a directory is the sum of the sizes of the files it contains, directly or indirectly

[...]

Find all of the directories with a total size of at most 100000. What is the sum of the total sizes of those directories?

Answer for above example is 95437

simonw commented 1 year ago

I'm going to try solving this using structs, inspired by https://fasterthanli.me/series/advent-of-code-2022/part-6 and similar.

simonw commented 1 year ago

Tried this with ChatGPT:

Given a file format like this: $ cd / $ ls dir a 14848514 b.txt

Write Rust code that uses the match operator to identify the four different types of line

Got back:

Here is an example of how you could use the match operator in Rust to identify the four different types of lines in the file format:

use std::fs;

fn main() {
    let file_contents = fs::read_to_string("file.txt").unwrap();

    for line in file_contents.lines() {
        match line {
            // A line that starts with `$ cd` is a "change directory" command.
            line if line.starts_with("$ cd") => println!("change directory: {}", line),
            // A line that starts with `$ ls` is a "list directory" command.
            line if line.starts_with("$ ls") => println!("list directory: {}", line),
            // A line that contains a file extension is a file.
            line if line.contains(".") => println!("file: {}", line),
            // Anything else is a directory.
            _ => println!("directory: {}", line),
        }
    }
}

This code will read the contents of a file named file.txt, and then iterate over each line of the file. For each line, it will use the match operator to determine the type of line and print a message to the console indicating the type of line.

simonw commented 1 year ago

I said:

Rewrite that to extract out the path in the case of cd, the directory name in the case of dir and the filesize and filename in the case of a file listing

Got back this code:

use std::fs;

fn main() {
    let file_contents = fs::read_to_string("file.txt").unwrap();

    for line in file_contents.lines() {
        match line {
            // A line that starts with `$ cd` is a "change directory" command.
            line if line.starts_with("$ cd") => {
                // Extract the path from the line by splitting on the space character and
                // taking the second element of the resulting array.
                let path = line.split(" ").nth(1).unwrap();
                println!("change directory: {}", path);
            }
            // A line that starts with `$ ls` is a "list directory" command.
            line if line.starts_with("$ ls") => println!("list directory: {}", line),
            // A line that contains a file extension is a file.
            line if line.contains(".") => {
                // Extract the file size and file name by splitting on the space character and
                // taking the first and second elements of the resulting array, respectively.
                let parts = line.split(" ").collect::<Vec<_>>();
                let file_size = parts[0];
                let file_name = parts[1];
                println!("file: {} ({})", file_name, file_size);
            }
            // Anything else is a directory.
            line => {
                // Extract the directory name by splitting on the space character and
                // taking the first element of the resulting array.
                let dir_name = line.split(" ").nth(0).unwrap();
                println!("directory: {}", dir_name);
            }
        }
    }
}

I'm going to write my own implementation inspired by these examples.

simonw commented 1 year ago

Copilot helped me write this:

#[derive(Debug)]
struct Directory {
    name: String,
    parent: Option<Box<Directory>>,
    children: Vec<Directory>,
    files: Vec<File>
}

I didn't know why it suggested Option<Box<Directory>> so I asked ChatGPT:

[derive(Debug)]

struct Directory { name: String, parent: Option<Box>, children: Vec, files: Vec } What does Option<Box> do?

Which replied:

Option<Box<Directory>> is a type that represents an optional value that contains a boxed Directory value. In this case, it is used to represent the parent of a Directory.

In Rust, the Option type is a generic enum that can be used to represent the presence or absence of a value. It has two variants: Some(T), which represents the presence of a value of type T, and None, which represents the absence of a value. The Box type is a smart pointer that can be used to store values on the heap. It is often used to store values that have a known size at compile time but an unknown size at runtime, such as when using recursive data structures like linked lists and trees.

In this case, the Option<Box<Directory>> type is used to represent the parent of a Directory. If a Directory has a parent, it will be represented as Some(Box<Directory>), and if it does not have a parent (e.g. because it is the root directory), it will be represented as None. This allows the Directory struct to store a reference to its parent directory while still allowing for the possibility that the directory has no parent.

simonw commented 1 year ago

Impressive auto-completion suggestion form Copilot:

CleanShot 2022-12-06 at 21 46 10@2x
simonw commented 1 year ago

I want to return a new directory object, not change self here.

simonw commented 1 year ago

Spent a while messing around with this, but it looks like my understanding of reference v.s. value types in Rust (and stuff like Box) isn't up to scratch yet - I can't get anything through the compiler.

Got as far as this:

use std::fs;

#[derive(Debug)]
struct File {
    filename: String,
    length: u16,
}

#[derive(Debug)]
struct Directory {
    name: String,
    parent: Option<Box<Directory>>,
    // List of references to directories
    children: Vec<Box<Directory>>,
    files: Vec<Box<File>>
}

impl Directory {
    // get_or_create_subdirectory() method for Directory
    fn get_or_create_subdirectory(self, name: String) -> &'static mut Box<Directory> {
        // Check if the subdirectory exists
        let mut exists = false;
        for child in &self.children {
            if child.name == name {
                exists = true;
                break;
            }
        }
        // If it does, return a reference to it
        if exists {
            for child in &mut self.children {
                if child.name == name {
                    return child;
                }
            }
        } else {
            // If it doesn't, create it
            let new_dir = Directory {
                name: name,
                parent: Some(Box::new(self)),
                children: Vec::new(),
                files: Vec::new()
            };
            let new_dir_boxed = Box::new(new_dir);
            self.children.push(new_dir_boxed);
            return &mut new_dir_boxed;
        }
        ()
    }
}

fn main() {
    let file_contents = fs::read_to_string("example.txt").unwrap();

    let mut root = Directory {
        name: String::from(""),
        parent: None,
        children: Vec::new(),
        files: Vec::new()
    };
    let mut current_directory = &mut root;

    for line in file_contents.lines() {
        match line {
            // $ cd X
            line if line.starts_with("$ cd") => {
                let path = line.split(" ").nth(1).unwrap();
                if path == ".." {
                    current_directory = current_directory.parent.as_mut().unwrap();
                } else {
                    current_directory = current_directory.get_or_create_subdirectory(path);
                }
                println!("change directory: {}", path);
            }
        }
    }
}

Which returns these errors:

day_07 % cargo run
   Compiling day_07 v0.1.0 (/Users/simon/Dropbox/Development/advent-of-code-2022-in-rust/day_07)
error[E0308]: mismatched types
  --> src/main.rs:48:9
   |
20 |     fn get_or_create_subdirectory(self, name: String) -> &'static mut Box<Directory> {
   |                                                          --------------------------- expected `&'static mut Box<Directory>` because of return type
...
48 |         ()
   |         ^^ expected mutable reference, found `()`
   |
   = note: expected mutable reference `&'static mut Box<Directory>`
                      found unit type `()`

error[E0308]: mismatched types
   --> src/main.rs:106:86
    |
106 |                     current_directory = current_directory.get_or_create_subdirectory(path);
    |                                                                                      ^^^^- help: try using a conversion method: `.to_string()`
    |                                                                                      |
    |                                                                                      expected struct `String`, found `&str`

For more information about this error, try `rustc --explain E0308`.
error: could not compile `day_07` due to 2 previous errors

But that's after a few rounds through the compiler where I added &mut and similar in a bunch of places without understanding why.

I think I should read some of the Rust book at this point!

simonw commented 1 year ago

My hunch is that I'm hitting the limits of what a large language model can do with Rust here - the subtleties of & may be beyond what it can do in terms of correctly generating code.

Which is frustrating because they are currently beyond me too!

simonw commented 1 year ago

I think I need to read and understand these two chapters:

simonw commented 1 year ago

Notes on chapter 4:

Ownership is Rust’s most unique feature and has deep implications for the rest of the language. It enables Rust to make memory safety guarantees without needing a garbage collector

[...]

Some languages have garbage collection that regularly looks for no-longer used memory as the program runs; in other languages, the programmer must explicitly allocate and free the memory. Rust uses a third approach: memory is managed through a system of ownership with a set of rules that the compiler checks. If any of the rules are violated, the program won’t compile. None of the features of ownership will slow down your program while it’s running.

[...]

  • Each value in Rust has an owner.
  • There can only be one owner at a time.
  • When the owner goes out of scope, the value will be dropped.
simonw commented 1 year ago

Interesting: these are different:

let s = "string literal"; // This has type &str
let s = String::from("hello"); // This has type String

A String is allocated on the heap. When the variable that owns it goes out of scope that memory is released.

When a variable goes out of scope, Rust calls a special function for us. This function is called drop, and it's where the author of String can put the code to return the memory. Rust calls drop automatically at the closing curly bracket.

simonw commented 1 year ago
    let s1 = String::from("hello");
    let s2 = s1;

This creates a copy of the length and capacity integers for the string, and copies the pointer to its content but does not create a duplicate of that content.

image

After the above code executes, s2 has taken ownership of the string. Reference to s1 are no longer valid and will return an error.

This is to ensure Rust can safely release the memory for the string when the variable that owns it goes out of scope.

The resulting error message would look like this:

error[E0382]: borrow of moved value: `s1`
 --> src/main.rs:5:28
  |
2 |     let s1 = String::from("hello");
  |         -- move occurs because `s1` has type `String`, which does not implement the `Copy` trait
3 |     let s2 = s1;
  |              -- value moved here
4 | 
5 |     println!("{}, world!", s1);
  |                            ^^ value borrowed here after move

So that's what "borrow of moved value" means!

If you've heard the terms shallow copy and deep copy while working with other languages, the concept of copying the pointer, length, and capacity without copying the data probably sounds like making a shallow copy. But because Rust also invalidates the first variable, instead of calling it a shallow copy, it's known as a move.

Rust will never automatically create "deep" copies of your data. Therefore, any automatic copying can be assumed to be inexpensive in terms of runtime performance.

    let s2 = s1.clone();

The .clone() method DOES copy the string contents and create a new string.

simonw commented 1 year ago

Rust has a special annotation called the Copy trait that we can place on types that are stored on the stack, as integers are (we'll talk more about traits in Chapter 10). If a type implements the Copy trait, variables that use it do not move, but rather are trivially copied, making them still valid after assignment to another variable.

Rust won’t let us annotate a type with Copy if the type, or any of its parts, has implemented the Drop trait.

Notable, String does NOT implement Copy - it implements the Clone trait instead.

simonw commented 1 year ago

I asked ChatGPT:

what traits does String implement?

And got back:

The String type in Rust implements a number of traits, including the following:

  • Clone: This trait allows a String to be cloned, creating a new independent String with the same data.
  • Debug: This trait allows a String to be printed in a human-readable form for debugging purposes.
  • Display: This trait allows a String to be converted into a string that can be displayed to the user.
  • Eq: This trait allows two Strings to be compared for equality.
  • Hash: This trait allows a String to be used as a key in a HashMap.
  • Ord: This trait allows Strings to be compared and ordered based on their values.
  • PartialEq: This trait allows two Strings to be compared for partial equality.
  • PartialOrd: This trait allows Strings to be compared and ordered based on only a subset of their values.

I double-checked against the documentation and it looks like that list is very incomplete - though it did say "including the following" and appears to have picked the most interesting ones.

https://doc.rust-lang.org/std/string/struct.String.html#trait-implementations

simonw commented 1 year ago

This is useful, from https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html#ownership-and-functions

fn main() {
    let s = String::from("hello");  // s comes into scope

    takes_ownership(s);             // s's value moves into the function...
                                    // ... and so is no longer valid here

    let x = 5;                      // x comes into scope

    makes_copy(x);                  // x would move into the function,
                                    // but i32 is Copy, so it's okay to still
                                    // use x afterward

} // Here, x goes out of scope, then s. But because s's value was moved, nothing
  // special happens.

fn takes_ownership(some_string: String) { // some_string comes into scope
    println!("{}", some_string);
} // Here, some_string goes out of scope and `drop` is called. The backing
  // memory is freed.

fn makes_copy(some_integer: i32) { // some_integer comes into scope
    println!("{}", some_integer);
} // Here, some_integer goes out of scope. Nothing special happens.
simonw commented 1 year ago

What if we want to let a function use a value but not take ownership? [...] Rust has a feature for using a value without transferring ownership, called references

https://doc.rust-lang.org/book/ch04-02-references-and-borrowing.html

simonw commented 1 year ago

A reference is like a pointer in that it's an address we can follow to access the data stored at that address; that data is owned by some other variable. Unlike a pointer, a reference is guaranteed to point to a valid value of a particular type for the life of that reference.

fn main() {
    let s1 = String::from("hello");

    let len = calculate_length(&s1);

    println!("The length of '{}' is {}.", s1, len);
}

fn calculate_length(s: &String) -> usize {
    s.len()
}
simonw commented 1 year ago

So crucially I had thought that calling a function with &s1 was different in that it would pass by reference instead of passing by value - but actually it relates to ownership for memory management, not to a difference in sending a new copy v.s. sending a pointer. A pointer is passed in both cases.

We call the action of creating a reference borrowing. As in real life, if a person owns something, you can borrow it from them. When you're done, you have to give it back. You don't own it.

simonw commented 1 year ago

This next bit is really important: it's what was tripping me up earlier.

Just as variables are immutable by default, so are references. We’re not allowed to modify something we have a reference to.

This code has that bug:

fn main() {
    let s = String::from("hello");

    change(&s);
}

fn change(some_string: &String) {
    some_string.push_str(", world");
}

Error message is:

error[E0596]: cannot borrow `*some_string` as mutable, as it is behind a `&` reference
 --> src/main.rs:8:5
  |
7 | fn change(some_string: &String) {
  |                        ------- help: consider changing this to be a mutable reference: `&mut String`
8 |     some_string.push_str(", world");
  |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `some_string` is a `&` reference, so the data it refers to cannot be borrowed as mutable

The fix is to use a mutable reference:

fn main() {
    let mut s = String::from("hello");

    change(&mut s);
}

fn change(some_string: &mut String) {
    some_string.push_str(", world");
}

Mutable references have one big restriction: if you have a mutable reference to a value, you can have no other references to that value.

I asked ChatGPT:

"Mutable references have one big restriction: if you have a mutable reference to a value, you can have no other references to that value."

Why is that?

It replied:

The reason for this restriction is to prevent data races and other concurrency errors. In Rust, a data race occurs when two or more threads are trying to access and modify the same data simultaneously, without any mechanism for ensuring that the accesses and modifications are properly synchronized. This can lead to unpredictable and potentially incorrect behavior, such as one thread overwriting the changes made by another.

By allowing only one mutable reference to a value at a time, Rust's borrowing rules ensure that there is only one place where a value can be modified, and that other threads can't access the value while it's being modified. This eliminates the possibility of data races and other concurrency errors, and helps ensure that Rust programs are safe and correct.

simonw commented 1 year ago

And sure enough, the Rust book says this a few paragraphs later:

The restriction preventing multiple mutable references to the same data at the same time allows for mutation but in a very controlled fashion. It's something that new Rustaceans struggle with, because most languages let you mutate whenever you'd like. The benefit of having this restriction is that Rust can prevent data races at compile time. A data race is similar to a race condition and happens when these three behaviors occur:

  • Two or more pointers access the same data at the same time.
  • At least one of the pointers is being used to write to the data.
  • There's no mechanism being used to synchronize access to the data.

Data races cause undefined behavior and can be difficult to diagnose and fix when you're trying to track them down at runtime; Rust prevents this problem by refusing to compile code with data races!

simonw commented 1 year ago

We also cannot have a mutable reference while we have an immutable one to the same value.

Users of an immutable reference don’t expect the value to suddenly change out from under them! However, multiple immutable references are allowed because no one who is just reading the data has the ability to affect anyone else’s reading of the data.

simonw commented 1 year ago

I skipped the sub-chapter on Slices for the moment, can revisit when I need to know how those work: https://doc.rust-lang.org/book/ch04-03-slices.html

simonw commented 1 year ago

Bad news is I might have to understand lifetimes for this exercise too - the https://doc.rust-lang.org/book/ch05-01-defining-structs.html ends with this note:

In Chapter 10, we'll discuss how to fix these errors so you can store references in structs, but for now, we'll fix errors like these using owned types like String instead of references like &str.

Do I need the ability to store references in my Directory struct, or can I get away with just "owned types"?

simonw commented 1 year ago

I asked ChatGPT "In Rust explain the difference between references and owned types" and it reminded me that:

Owned types, on the other hand, are values that are stored directly on the stack and are owned by the current function or block of code. These types do not need to be borrowed with a reference, as they are stored directly in memory and can be accessed and modified in place.

simonw commented 1 year ago

A note from https://doc.rust-lang.org/book/ch05-03-method-syntax.html on why struct methods use &self and not just self:

We've chosen &self here for the same reason we used &Rectangle in the function version: we don't want to take ownership, and we just want to read the data in the struct, not write to it. If we wanted to change the instance that we've called the method on as part of what the method does, we'd use &mut self as the first parameter. Having a method that takes ownership of the instance by using just self as the first parameter is rare; this technique is usually used when the method transforms self into something else and you want to prevent the caller from using the original instance after the transformation.

simonw commented 1 year ago

https://rust-leipzig.github.io/architecture/2016/12/20/idiomatic-trees-in-rust/

Writing tree structures in Rust is no trivial problem.

So at least this is known to be hard!

That post (from 2016) suggests a pattern where tree nodes reference each other by integer ID and maintain an "arena" vector that lists all of the nodes in one place.

Here's a crate implementing that pattern: https://github.com/saschagrunert/indextree

This library looks popular and uses similar ideas: https://lib.rs/crates/id_tree - top search result for "tree" here, which orders by popularity: https://lib.rs/search?q=tree

simonw commented 1 year ago

https://blog.logrocket.com/guide-using-arenas-rust/ says:

Data structures that have elements pointing to other elements are notoriously hard to implement and use in Rust. Seriously --- there's a whole book about implementing various linked lists in Rust!

The LogRocket article goes on to show how various crates for arena-based memory management are used.

simonw commented 1 year ago

https://rust-unofficial.github.io/too-many-lists/first-layout.html is really useful.

simonw commented 1 year ago

https://fasterthanli.me/series/advent-of-code-2022/part-7 is out and has lots of clues. In particular:

One pattern that's commonly used in Rust when tackling a problem that the borrow checker doesn't quite understand, is to put everything in a big Vec and use indices.

It shows how to use id_tree for that but I'm tempted to try and roll a solution like that myself.

simonw commented 1 year ago

OK, now I'm getting somewhere:

use std::fs;

#[derive(Debug)]
struct File {
    filename: String,
    length: usize,
}

#[derive(Debug)]
struct Directory {
    name: String,
    // List of references to directories
    directories: Vec<usize>,
    // List of references to files
    files: Vec<usize>
}

fn main() {
    let file_contents = fs::read_to_string("example.txt").unwrap();

    let mut directories = vec![
        Directory {
            name: String::from(""),
            directories: Vec::new(),
            files: Vec::new()
        }
    ];
    let mut files: Vec<File> = Vec::new();

    let mut current_directory_idx: usize = 0;

    println!("Current directory index: {}, current directory: {:?}", current_directory_idx, directories[current_directory_idx]);

    for line in file_contents.lines() {
        match line {
            // $ cd X
            line if line.starts_with("$ cd") => {
                let path = line.split(" ").nth(2).unwrap();
                println!("change directory: '{}'", path);
                if path == "/" {
                    current_directory_idx = 0;
                } else if path == ".." {
                    // Find directory for which current_directory_idx is in its directories
                    for (i, directory) in directories.iter().enumerate() {
                        if directory.directories.contains(&current_directory_idx) {
                            current_directory_idx = i;
                            break;
                        }
                    }
                    println!("cd .. got us to '{}'", directories[current_directory_idx].name);
                } else {
                    // Cd to this directory
                    let mut found = false;
                    for dir_idx in &directories[current_directory_idx].directories {
                        if directories[*dir_idx].name == path {
                            current_directory_idx = *dir_idx;
                            found = true;
                            break;
                        }
                    }
                    if !found {
                        // Create the directory
                        let new_dir = Directory {
                            name: String::from(path),
                            directories: Vec::new(),
                            files: Vec::new()
                        };
                        directories.push(new_dir);
                        let new_dir_idx = directories.len() - 1;
                        directories[current_directory_idx].directories.push(new_dir_idx);
                        current_directory_idx = new_dir_idx;
                    }
                }
            }
            // Ignore ls
            line if line.starts_with("$ ls") => {
                println!("ls");
            }
            // integer_file_size filename
            line if line.split(" ").nth(0).unwrap().parse::<usize>().is_ok() => {
                let file_size = line.split(" ").nth(0).unwrap().parse::<usize>().unwrap();
                let filename = line.split(" ").nth(1).unwrap();
                println!("file: '{}', size: {}", filename, file_size);
                let new_file = File {
                    filename: String::from(filename),
                    length: file_size
                };
                files.push(new_file);
                directories[current_directory_idx].files.push(files.len() - 1);
            }
            line => {
                println!("line: {}", line);
                let bits = line.split(" ").collect::<Vec<&str>>();
                println!(" . bits: {:?}", bits);
            }
        }
    }
    // Debug print files
    println!("Files:");
    for file in &files {
        println!(" . {:?}", file);
    }
    // Debug print directories
    println!("Directories:");
    for directory in &directories {
        println!(" . {:?}", directory);
    }
}

Output:

day_07 % cargo run
   Compiling day_07 v0.1.0 (/Users/simon/Dropbox/Development/advent-of-code-2022-in-rust/day_07)
warning: field is never read: `filename`
 --> src/main.rs:5:5
  |
5 |     filename: String,
  |     ^^^^^^^^^^^^^^^^
  |
  = note: `#[warn(dead_code)]` on by default
note: `File` has a derived impl for the trait `Debug`, but this is intentionally ignored during dead code analysis
 --> src/main.rs:3:10
  |
3 | #[derive(Debug)]
  |          ^^^^^
  = note: this warning originates in the derive macro `Debug` (in Nightly builds, run with -Z macro-backtrace for more info)

warning: field is never read: `length`
 --> src/main.rs:6:5
  |
6 |     length: usize,
  |     ^^^^^^^^^^^^^
  |
note: `File` has a derived impl for the trait `Debug`, but this is intentionally ignored during dead code analysis
 --> src/main.rs:3:10
  |
3 | #[derive(Debug)]
  |          ^^^^^
  = note: this warning originates in the derive macro `Debug` (in Nightly builds, run with -Z macro-backtrace for more info)

warning: `day_07` (bin "day_07") generated 2 warnings
    Finished dev [unoptimized + debuginfo] target(s) in 0.59s
     Running `target/debug/day_07`
Current directory index: 0, current directory: Directory { name: "", directories: [], files: [] }
change directory: '/'
ls
line: dir a
 . bits: ["dir", "a"]
file: 'b.txt', size: 14848514
file: 'c.dat', size: 8504156
line: dir d
 . bits: ["dir", "d"]
change directory: 'a'
ls
line: dir e
 . bits: ["dir", "e"]
file: 'f', size: 29116
file: 'g', size: 2557
file: 'h.lst', size: 62596
change directory: 'e'
ls
file: 'i', size: 584
change directory: '..'
cd .. got us to 'a'
change directory: '..'
cd .. got us to ''
change directory: 'd'
ls
file: 'j', size: 4060174
file: 'd.log', size: 8033020
file: 'd.ext', size: 5626152
file: 'k', size: 7214296
Files:
 . File { filename: "b.txt", length: 14848514 }
 . File { filename: "c.dat", length: 8504156 }
 . File { filename: "f", length: 29116 }
 . File { filename: "g", length: 2557 }
 . File { filename: "h.lst", length: 62596 }
 . File { filename: "i", length: 584 }
 . File { filename: "j", length: 4060174 }
 . File { filename: "d.log", length: 8033020 }
 . File { filename: "d.ext", length: 5626152 }
 . File { filename: "k", length: 7214296 }
Directories:
 . Directory { name: "", directories: [1, 3], files: [0, 1] }
 . Directory { name: "a", directories: [2], files: [2, 3, 4] }
 . Directory { name: "e", directories: [], files: [5] }
 . Directory { name: "d", directories: [], files: [6, 7, 8, 9] }

I wrote this with a bunch of Copilot assistance, prompted by comments.

simonw commented 1 year ago

Got the right result for example.txt:

Total sizes: [48381165, 94853, 584, 24933642]
Total size of small directories: 95437
simonw commented 1 year ago

https://github.com/simonw/advent-of-code-2022-in-rust/blob/9d432c1959e0dd0382eca71d584a77ba70cb4dea/day_07/src/main.rs#L1-L136

919137 is the right answer for input.txt.

simonw commented 1 year ago

Part 2:

The total disk space available to the filesystem is 70000000. To run the update, you need unused space of at least 30000000. You need to find a directory you can delete that will free up enough space to run the update.

[...]

Find the smallest directory that, if deleted, would free up enough space on the filesystem to run the update. What is the total size of that directory?

simonw commented 1 year ago

So free space = 70000000 - total space - I need to find the smallest directory that will get that to under 30000000 if I delete it.

simonw commented 1 year ago

Got to the right answer for the example data:

Part 2
======

Total size of everything: 48381165
Directory sizes: [(0, "", 48381165), (1, "a", 94853), (2, "e", 584), (3, "d", 24933642)]
Space needed to free: 18381165
Candidates: [(0, "", 48381165), (3, "d", 24933642)]
Sorted candidates: [(3, "d", 24933642), (0, "", 48381165)]
So the directory to delete is: (3, "d", 24933642)
simonw commented 1 year ago

Against input.txt I got:

Total size of everything: 42805968
Directory sizes: [(0, "", 42805968), (1, "cmwrq", 10449487), (2, "dtbzzl", 2919892), (3, "wwpnn", 2877389), (4, "lwqgsbg", 1927095), (5, "dtbzzl", 252091), (6, "pjts", 374873), (7, "sbvljdh", 470307), (8, "rdrqc", 242263), (9, "rtfpcswj", 228044), (10, "nztw", 30777), (11, "rdtjztmt", 736944), (12, "rzbb", 155253), (13, "ssdlqcrw", 22423), (14, "pjnghbm", 6096648), (15, "dtbzzl", 4008321), (16, "hlw", 2490397), (17, "lfqqrp", 2177388), (18, "mbmfpz", 795801), (19, "fsrbwl", 733934), (20, "ltsj", 51204), (21, "ssdlqcrw", 64928), (22, "lsmpw", 61867), (23, "mdhfdlw", 331904), (24, "pjts", 846114), (25, "pjts", 583150), (26, "qzs", 11155), (27, "ssdlqcrw", 192414), (28, "nztw", 313009), (29, "nztw", 550260), (30, "pvsjpn", 247651), (31, "pjts", 1693637), (32, "btc", 5264), (33, "tpwcmvch", 1429016), (34, "nlrq", 478350), (35, "dtbzzl", 271143), (36, "wvnqzjf", 64128), (37, "vhvn", 187526), (38, "vrt", 1069296), (39, "drrnm", 250912), (40, "fqr", 409913), (41, "nztw", 9998), (42, "bqf", 9998), (43, "zgrjmtcq", 109025), (44, "ftrccld", 17428128), (45, "dtbzzl", 500982), (46, "fgwh", 500982), (47, "dpdvswq", 500982), (48, "jsstq", 252517), (49, "fvmh", 5973492), (50, "djcn", 8600), (51, "dtbzzl", 2623574), (52, "sppdjcm", 1601353), (53, "fghb", 368529), (54, "pfjdsm", 488529), (55, "pjts", 191709), (56, "vtnzqtvj", 1022221), (57, "rrl", 1022221), (58, "lzlswv", 274776), (59, "dmh", 274776), (60, "sjsqnvq", 221327), (61, "mvppnhr", 1735755), (62, "ldwv", 161179), (63, "rmdjdqvl", 365329), (64, "gnztqmhv", 176043), (65, "lpmhfr", 114477), (66, "jrrdsd", 114477), (67, "tphjm", 74809), (68, "tmj", 557891), (69, "vhvn", 475203), (70, "zdqlwnc", 986590), (71, "fbhcv", 368096), (72, "rsdltnvc", 275370), (73, "fwztt", 5680367), (74, "ssdlqcrw", 837222), (75, "bzd", 256105), (76, "vvzsr", 3908141), (77, "bcdqrs", 433473), (78, "ssdlqcrw", 293104), (79, "vhvn", 178932), (80, "wthshgg", 2658141), (81, "dppwvtmp", 275481), (82, "ljgszd", 125627), (83, "vbjsmgp", 236951), (84, "vzqc", 853242), (85, "vhvn", 199196), (86, "zmpdrpd", 658741), (87, "lsvvn", 282867), (88, "pjts", 40866), (89, "qglhlggq", 3127580), (90, "vhvn", 2290224), (91, "dtbzzl", 72812), (92, "mdng", 1196426), (93, "gdqqtvnp", 292349), (94, "nrmhpblm", 130971), (95, "vhvn", 274549), (96, "ptqq", 424417), (97, "wpvq", 173437), (98, "rslphgp", 641688), (99, "vhvn", 289195), (100, "ssdlqcrw", 289195), (101, "jjlbmtw", 550900), (102, "jpncfpb", 1662116), (103, "hjgwcmh", 1002118), (104, "fmst", 655150), (105, "gzjq", 254691), (106, "prjqfwf", 151003), (107, "ssdlqcrw", 103688), (108, "vhvn", 113113), (109, "mddr", 134280), (110, "qpfjp", 134280), (111, "cfhv", 134280), (112, "ssdlqcrw", 134280), (113, "mthvntdd", 11686814), (114, "bcdcz", 1540498), (115, "lbvbc", 328642), (116, "tvgclpsc", 359720), (117, "cngbf", 2688184), (118, "hstph", 172788), (119, "qrnpjz", 2465255), (120, "blzc", 108342), (121, "rvl", 774194), (122, "bcrf", 325278), (123, "dcb", 18316), (124, "zlgjzcjv", 18316), (125, "gnhvtgm", 110236), (126, "sjbr", 448916), (127, "zvhtzqqc", 1582719), (128, "fglfpn", 39153), (129, "gtzrq", 60625), (130, "hfgdcf", 36016), (131, "msc", 742497), (132, "hrz", 528016), (133, "vlhllqz", 157880), (134, "ssdlqcrw", 429451), (135, "mqjjbq", 4254868), (136, "fmvp", 2674857), (137, "dhnn", 232575), (138, "dlcvwqw", 285116), (139, "jzqt", 1284585), (140, "nsvlqq", 325305), (141, "vmp", 104275), (142, "nvsdbjj", 400818), (143, "pjts", 111715), (144, "rhldnjlt", 788698), (145, "ffhcbvmf", 247668), (146, "vprlq", 541030), (147, "nhblb", 1922310), (148, "nfwl", 989965), (149, "lqs", 700074), (150, "mbcft", 272856), (151, "ntmvt", 262298), (152, "nztw", 164920), (153, "mlvnlz", 289891), (154, "ptc", 190002), (155, "rng", 39093), (156, "vhvn", 275854), (157, "vlj", 222492), (158, "qqqrm", 19102), (159, "wcpllh", 19102), (160, "pwg", 19102), (161, "ssdlqcrw", 181610), (162, "ttvgr", 332119), (163, "vpcpd", 332119), (164, "vdmm", 95079), (165, "wnhnwjm", 309278), (166, "zdvbsb", 293736), (167, "ptzsl", 26404), (168, "wmqc", 812195), (169, "dtbzzl", 189696), (170, "hpzgnb", 189696), (171, "hdzmzc", 143510), (172, "nmmpwqvz", 276725), (173, "qjnm", 202264)]
Space needed to free: 12805968
Candidates: [(0, "", 42805968), (44, "ftrccld", 17428128)]
Sorted candidates: [(44, "ftrccld", 17428128), (0, "", 42805968)]
So the directory to delete is: (44, "ftrccld", 17428128)

Tried 17428128 and it said:

That's not the right answer; your answer is too high

simonw commented 1 year ago

That output looks wrong to me. I think there are directories I can delete to free up 12805968 other than the two it suggested.

simonw commented 1 year ago

Spotted the fix:

index ba670d3..540141d 100644
--- a/day_07/src/main.rs
+++ b/day_07/src/main.rs
@@ -165,8 +165,8 @@ fn main() {
     println!("Directory sizes: {:?}", directory_sizes);

     // Need 30000000 free to run update, which directories could we delete to get that?
-    let free_space = 30000000;
-    let space_needed_to_free = total_size_of_everything - free_space;
+    let space_needed = 70000000 - 30000000;
+    let space_needed_to_free = total_size_of_everything - space_needed;
     println!("Space needed to free: {}", space_needed_to_free);

     let candidates = directory_sizes
simonw commented 1 year ago

This gave me the correct answer:

Space needed to free: 2805968
Candidates: [(0, "", 42805968), (1, "cmwrq", 10449487), (2, "dtbzzl", 2919892), (3, "wwpnn", 2877389), (14, "pjnghbm", 6096648), (15, "dtbzzl", 4008321), (44, "ftrccld", 17428128), (49, "fvmh", 5973492), (73, "fwztt", 5680367), (76, "vvzsr", 3908141), (89, "qglhlggq", 3127580), (113, "mthvntdd", 11686814), (135, "mqjjbq", 4254868)]
Sorted candidates: [(3, "wwpnn", 2877389), (2, "dtbzzl", 2919892), (89, "qglhlggq", 3127580), (76, "vvzsr", 3908141), (15, "dtbzzl", 4008321), (135, "mqjjbq", 4254868), (73, "fwztt", 5680367), (49, "fvmh", 5973492), (14, "pjnghbm", 6096648), (1, "cmwrq", 10449487), (113, "mthvntdd", 11686814), (44, "ftrccld", 17428128), (0, "", 42805968)]
So the directory to delete is: (3, "wwpnn", 2877389)
simonw commented 1 year ago

https://github.com/simonw/advent-of-code-2022-in-rust/blob/5cccef55411769193f4fa33156ee7476fddb2517/day_07/src/main.rs#L1-L220

That one was really hard. I'd like to do a tree in Rust some day without needing to cheat and use a vector and vector indexes, but I'm glad I got there. I also wrote some functions to help along the way, which was nice.

Copilot did write some of the gnarlier code for me though, in particular this bit:

    let candidates = directory_sizes
        .iter()
        .filter(|(_, _, size)| *size >= space_needed_to_free)
        .collect::<Vec<_>>();

    // Sort candidates by third value in tuple
    let mut candidates = candidates;
    candidates.sort_by(|(_, _, size1), (_, _, size2)| size1.cmp(size2));
    println!("Sorted candidates: {:?}", candidates);
bmon commented 1 year ago

Great job solving this. I'm also new to rust and ran into the same problems you had. I ended up with a big hashmap that owned the directories which I passed around. Your research was excellent and helped me understand why I had so much trouble with this, and also that my solution wasn't so bad after all.

Thank you very much for documenting and sharing your process like this!